Cargando…

Taxonomic identification accuracy from BOLD and GenBank databases using over a thousand insect DNA barcodes from Colombia

Recent declines of insect populations at high rates have resulted in the need to develop a quick method to determine their diversity and to process massive data for the identification of species of highly diverse groups. A short sequence of DNA from COI is widely used for insect identification by co...

Descripción completa

Detalles Bibliográficos
Autores principales: Baena-Bejarano, Nathalie, Reina, Catalina, Martínez-Revelo, Diego Esteban, Medina, Claudia A., Tovar, Eduardo, Uribe-Soto, Sandra, Neita-Moreno, Jhon Cesar, Gonzalez, Mailyn A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10124890/
https://www.ncbi.nlm.nih.gov/pubmed/37093820
http://dx.doi.org/10.1371/journal.pone.0277379
_version_ 1785029930813227008
author Baena-Bejarano, Nathalie
Reina, Catalina
Martínez-Revelo, Diego Esteban
Medina, Claudia A.
Tovar, Eduardo
Uribe-Soto, Sandra
Neita-Moreno, Jhon Cesar
Gonzalez, Mailyn A.
author_facet Baena-Bejarano, Nathalie
Reina, Catalina
Martínez-Revelo, Diego Esteban
Medina, Claudia A.
Tovar, Eduardo
Uribe-Soto, Sandra
Neita-Moreno, Jhon Cesar
Gonzalez, Mailyn A.
author_sort Baena-Bejarano, Nathalie
collection PubMed
description Recent declines of insect populations at high rates have resulted in the need to develop a quick method to determine their diversity and to process massive data for the identification of species of highly diverse groups. A short sequence of DNA from COI is widely used for insect identification by comparing it against sequences of known species. Repositories of sequences are available online with tools that facilitate matching of the sequences of interest to a known individual. However, the performance of these tools can differ. Here we aim to assess the accuracy in identification of insect taxonomic categories from two repositories, BOLD Systems and GenBank. This was done by comparing the sequence matches between the taxonomist identification and the suggested identification from the platforms. We used 1,160 COI sequences representing eight orders of insects from Colombia. After the comparison, we reanalyzed the results from a representative subset of the data from the subfamily Scarabaeinae (Coleoptera). Overall, BOLD systems outperformed GenBank, and the performance of both engines differed by orders and other taxonomic categories (species, genus and family). Higher rates of accurate identification were obtained at family and genus levels. The accuracy was higher in BOLD for the order Coleoptera at family level, for Coleoptera and Lepidoptera at genus and species level. Other orders performed similarly in both repositories. Moreover, the Scarabaeinae subset showed that species were correctly identified only when BOLD match percentage was above 93.4% and a total of 85% of the samples were correctly assigned to a taxonomic category. These results accentuate the great potential of the identification engines to place insects accurately into their respective taxonomic categories based on DNA barcodes and highlight the reliability of BOLD Systems for insect identification in the absence of a large reference database for a highly diverse country.
format Online
Article
Text
id pubmed-10124890
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-101248902023-04-25 Taxonomic identification accuracy from BOLD and GenBank databases using over a thousand insect DNA barcodes from Colombia Baena-Bejarano, Nathalie Reina, Catalina Martínez-Revelo, Diego Esteban Medina, Claudia A. Tovar, Eduardo Uribe-Soto, Sandra Neita-Moreno, Jhon Cesar Gonzalez, Mailyn A. PLoS One Research Article Recent declines of insect populations at high rates have resulted in the need to develop a quick method to determine their diversity and to process massive data for the identification of species of highly diverse groups. A short sequence of DNA from COI is widely used for insect identification by comparing it against sequences of known species. Repositories of sequences are available online with tools that facilitate matching of the sequences of interest to a known individual. However, the performance of these tools can differ. Here we aim to assess the accuracy in identification of insect taxonomic categories from two repositories, BOLD Systems and GenBank. This was done by comparing the sequence matches between the taxonomist identification and the suggested identification from the platforms. We used 1,160 COI sequences representing eight orders of insects from Colombia. After the comparison, we reanalyzed the results from a representative subset of the data from the subfamily Scarabaeinae (Coleoptera). Overall, BOLD systems outperformed GenBank, and the performance of both engines differed by orders and other taxonomic categories (species, genus and family). Higher rates of accurate identification were obtained at family and genus levels. The accuracy was higher in BOLD for the order Coleoptera at family level, for Coleoptera and Lepidoptera at genus and species level. Other orders performed similarly in both repositories. Moreover, the Scarabaeinae subset showed that species were correctly identified only when BOLD match percentage was above 93.4% and a total of 85% of the samples were correctly assigned to a taxonomic category. These results accentuate the great potential of the identification engines to place insects accurately into their respective taxonomic categories based on DNA barcodes and highlight the reliability of BOLD Systems for insect identification in the absence of a large reference database for a highly diverse country. Public Library of Science 2023-04-24 /pmc/articles/PMC10124890/ /pubmed/37093820 http://dx.doi.org/10.1371/journal.pone.0277379 Text en © 2023 Baena-Bejarano et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Baena-Bejarano, Nathalie
Reina, Catalina
Martínez-Revelo, Diego Esteban
Medina, Claudia A.
Tovar, Eduardo
Uribe-Soto, Sandra
Neita-Moreno, Jhon Cesar
Gonzalez, Mailyn A.
Taxonomic identification accuracy from BOLD and GenBank databases using over a thousand insect DNA barcodes from Colombia
title Taxonomic identification accuracy from BOLD and GenBank databases using over a thousand insect DNA barcodes from Colombia
title_full Taxonomic identification accuracy from BOLD and GenBank databases using over a thousand insect DNA barcodes from Colombia
title_fullStr Taxonomic identification accuracy from BOLD and GenBank databases using over a thousand insect DNA barcodes from Colombia
title_full_unstemmed Taxonomic identification accuracy from BOLD and GenBank databases using over a thousand insect DNA barcodes from Colombia
title_short Taxonomic identification accuracy from BOLD and GenBank databases using over a thousand insect DNA barcodes from Colombia
title_sort taxonomic identification accuracy from bold and genbank databases using over a thousand insect dna barcodes from colombia
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10124890/
https://www.ncbi.nlm.nih.gov/pubmed/37093820
http://dx.doi.org/10.1371/journal.pone.0277379
work_keys_str_mv AT baenabejaranonathalie taxonomicidentificationaccuracyfromboldandgenbankdatabasesusingoverathousandinsectdnabarcodesfromcolombia
AT reinacatalina taxonomicidentificationaccuracyfromboldandgenbankdatabasesusingoverathousandinsectdnabarcodesfromcolombia
AT martinezrevelodiegoesteban taxonomicidentificationaccuracyfromboldandgenbankdatabasesusingoverathousandinsectdnabarcodesfromcolombia
AT medinaclaudiaa taxonomicidentificationaccuracyfromboldandgenbankdatabasesusingoverathousandinsectdnabarcodesfromcolombia
AT tovareduardo taxonomicidentificationaccuracyfromboldandgenbankdatabasesusingoverathousandinsectdnabarcodesfromcolombia
AT uribesotosandra taxonomicidentificationaccuracyfromboldandgenbankdatabasesusingoverathousandinsectdnabarcodesfromcolombia
AT neitamorenojhoncesar taxonomicidentificationaccuracyfromboldandgenbankdatabasesusingoverathousandinsectdnabarcodesfromcolombia
AT gonzalezmailyna taxonomicidentificationaccuracyfromboldandgenbankdatabasesusingoverathousandinsectdnabarcodesfromcolombia