Cargando…

BOLD and GenBank revisited – Do identification errors arise in the lab or in the sequence libraries?

Applications of biological knowledge, such as forensics, often require the determination of biological materials to a species level. As such, DNA-based approaches to identification, particularly DNA barcoding, are attracting increased interest. The capacity of DNA barcodes to assign newly encountere...

Descripción completa

Detalles Bibliográficos
Autores principales: Pentinsaari, Mikko, Ratnasingham, Sujeevan, Miller, Scott E., Hebert, Paul D. N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7162515/
https://www.ncbi.nlm.nih.gov/pubmed/32298363
http://dx.doi.org/10.1371/journal.pone.0231814
_version_ 1783523047945273344
author Pentinsaari, Mikko
Ratnasingham, Sujeevan
Miller, Scott E.
Hebert, Paul D. N.
author_facet Pentinsaari, Mikko
Ratnasingham, Sujeevan
Miller, Scott E.
Hebert, Paul D. N.
author_sort Pentinsaari, Mikko
collection PubMed
description Applications of biological knowledge, such as forensics, often require the determination of biological materials to a species level. As such, DNA-based approaches to identification, particularly DNA barcoding, are attracting increased interest. The capacity of DNA barcodes to assign newly encountered specimens to a species relies upon access to informatics platforms, such as BOLD and GenBank, which host libraries of reference sequences and support the comparison of new sequences to them. As parameterization of these libraries expands, DNA barcoding has the potential to make valuable contributions in diverse applied contexts. However, a recent publication called for caution after finding that both platforms performed poorly in identifying specimens of 17 common insect species. This study follows up on this concern by asking if the misidentifications reflected problems in the reference libraries or in the query sequences used to test them. Because this reanalysis revealed that missteps in acquiring and analyzing the query sequences were responsible for most misidentifications, a workflow is described to minimize such errors in future investigations. The present study also revealed the limitations imposed by the lack of a polished species-level taxonomy for many groups. In such cases, applications can be strengthened by mapping the geographic distributions of sequence-based species proxies rather than waiting for the maturation of formal taxonomic systems based on morphology.
format Online
Article
Text
id pubmed-7162515
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-71625152020-04-21 BOLD and GenBank revisited – Do identification errors arise in the lab or in the sequence libraries? Pentinsaari, Mikko Ratnasingham, Sujeevan Miller, Scott E. Hebert, Paul D. N. PLoS One Research Article Applications of biological knowledge, such as forensics, often require the determination of biological materials to a species level. As such, DNA-based approaches to identification, particularly DNA barcoding, are attracting increased interest. The capacity of DNA barcodes to assign newly encountered specimens to a species relies upon access to informatics platforms, such as BOLD and GenBank, which host libraries of reference sequences and support the comparison of new sequences to them. As parameterization of these libraries expands, DNA barcoding has the potential to make valuable contributions in diverse applied contexts. However, a recent publication called for caution after finding that both platforms performed poorly in identifying specimens of 17 common insect species. This study follows up on this concern by asking if the misidentifications reflected problems in the reference libraries or in the query sequences used to test them. Because this reanalysis revealed that missteps in acquiring and analyzing the query sequences were responsible for most misidentifications, a workflow is described to minimize such errors in future investigations. The present study also revealed the limitations imposed by the lack of a polished species-level taxonomy for many groups. In such cases, applications can be strengthened by mapping the geographic distributions of sequence-based species proxies rather than waiting for the maturation of formal taxonomic systems based on morphology. Public Library of Science 2020-04-16 /pmc/articles/PMC7162515/ /pubmed/32298363 http://dx.doi.org/10.1371/journal.pone.0231814 Text en © 2020 Pentinsaari et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Pentinsaari, Mikko
Ratnasingham, Sujeevan
Miller, Scott E.
Hebert, Paul D. N.
BOLD and GenBank revisited – Do identification errors arise in the lab or in the sequence libraries?
title BOLD and GenBank revisited – Do identification errors arise in the lab or in the sequence libraries?
title_full BOLD and GenBank revisited – Do identification errors arise in the lab or in the sequence libraries?
title_fullStr BOLD and GenBank revisited – Do identification errors arise in the lab or in the sequence libraries?
title_full_unstemmed BOLD and GenBank revisited – Do identification errors arise in the lab or in the sequence libraries?
title_short BOLD and GenBank revisited – Do identification errors arise in the lab or in the sequence libraries?
title_sort bold and genbank revisited – do identification errors arise in the lab or in the sequence libraries?
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7162515/
https://www.ncbi.nlm.nih.gov/pubmed/32298363
http://dx.doi.org/10.1371/journal.pone.0231814
work_keys_str_mv AT pentinsaarimikko boldandgenbankrevisiteddoidentificationerrorsariseinthelaborinthesequencelibraries
AT ratnasinghamsujeevan boldandgenbankrevisiteddoidentificationerrorsariseinthelaborinthesequencelibraries
AT millerscotte boldandgenbankrevisiteddoidentificationerrorsariseinthelaborinthesequencelibraries
AT hebertpauldn boldandgenbankrevisiteddoidentificationerrorsariseinthelaborinthesequencelibraries