Cargando…

DNA barcode data accurately assign higher spider taxa

The use of unique DNA sequences as a method for taxonomic identification is no longer fundamentally controversial, even though debate continues on the best markers, methods, and technology to use. Although both existing databanks such as GenBank and BOLD, as well as reference taxonomies, are imperfe...

Descripción completa

Detalles Bibliográficos
Autores principales: Coddington, Jonathan A., Agnarsson, Ingi, Cheng, Ren-Chung, Čandek, Klemen, Driskell, Amy, Frick, Holger, Gregorič, Matjaž, Kostanjšek, Rok, Kropf, Christian, Kweskin, Matthew, Lokovšek, Tjaša, Pipan, Miha, Vidergar, Nina, Kuntner, Matjaž
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4958005/
https://www.ncbi.nlm.nih.gov/pubmed/27547527
http://dx.doi.org/10.7717/peerj.2201
_version_ 1782444261994659840
author Coddington, Jonathan A.
Agnarsson, Ingi
Cheng, Ren-Chung
Čandek, Klemen
Driskell, Amy
Frick, Holger
Gregorič, Matjaž
Kostanjšek, Rok
Kropf, Christian
Kweskin, Matthew
Lokovšek, Tjaša
Pipan, Miha
Vidergar, Nina
Kuntner, Matjaž
author_facet Coddington, Jonathan A.
Agnarsson, Ingi
Cheng, Ren-Chung
Čandek, Klemen
Driskell, Amy
Frick, Holger
Gregorič, Matjaž
Kostanjšek, Rok
Kropf, Christian
Kweskin, Matthew
Lokovšek, Tjaša
Pipan, Miha
Vidergar, Nina
Kuntner, Matjaž
author_sort Coddington, Jonathan A.
collection PubMed
description The use of unique DNA sequences as a method for taxonomic identification is no longer fundamentally controversial, even though debate continues on the best markers, methods, and technology to use. Although both existing databanks such as GenBank and BOLD, as well as reference taxonomies, are imperfect, in best case scenarios “barcodes” (whether single or multiple, organelle or nuclear, loci) clearly are an increasingly fast and inexpensive method of identification, especially as compared to manual identification of unknowns by increasingly rare expert taxonomists. Because most species on Earth are undescribed, a complete reference database at the species level is impractical in the near term. The question therefore arises whether unidentified species can, using DNA barcodes, be accurately assigned to more inclusive groups such as genera and families—taxonomic ranks of putatively monophyletic groups for which the global inventory is more complete and stable. We used a carefully chosen test library of CO1 sequences from 49 families, 313 genera, and 816 species of spiders to assess the accuracy of genus and family-level assignment. We used BLAST queries of each sequence against the entire library and got the top ten hits. The percent sequence identity was reported from these hits (PIdent, range 75–100%). Accurate assignment of higher taxa (PIdent above which errors totaled less than 5%) occurred for genera at PIdent values >95 and families at PIdent values ≥ 91, suggesting these as heuristic thresholds for accurate generic and familial identifications in spiders. Accuracy of identification increases with numbers of species/genus and genera/family in the library; above five genera per family and fifteen species per genus all higher taxon assignments were correct. We propose that using percent sequence identity between conventional barcode sequences may be a feasible and reasonably accurate method to identify animals to family/genus. However, the quality of the underlying database impacts accuracy of results; many outliers in our dataset could be attributed to taxonomic and/or sequencing errors in BOLD and GenBank. It seems that an accurate and complete reference library of families and genera of life could provide accurate higher level taxonomic identifications cheaply and accessibly, within years rather than decades.
format Online
Article
Text
id pubmed-4958005
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-49580052016-08-19 DNA barcode data accurately assign higher spider taxa Coddington, Jonathan A. Agnarsson, Ingi Cheng, Ren-Chung Čandek, Klemen Driskell, Amy Frick, Holger Gregorič, Matjaž Kostanjšek, Rok Kropf, Christian Kweskin, Matthew Lokovšek, Tjaša Pipan, Miha Vidergar, Nina Kuntner, Matjaž PeerJ Biodiversity The use of unique DNA sequences as a method for taxonomic identification is no longer fundamentally controversial, even though debate continues on the best markers, methods, and technology to use. Although both existing databanks such as GenBank and BOLD, as well as reference taxonomies, are imperfect, in best case scenarios “barcodes” (whether single or multiple, organelle or nuclear, loci) clearly are an increasingly fast and inexpensive method of identification, especially as compared to manual identification of unknowns by increasingly rare expert taxonomists. Because most species on Earth are undescribed, a complete reference database at the species level is impractical in the near term. The question therefore arises whether unidentified species can, using DNA barcodes, be accurately assigned to more inclusive groups such as genera and families—taxonomic ranks of putatively monophyletic groups for which the global inventory is more complete and stable. We used a carefully chosen test library of CO1 sequences from 49 families, 313 genera, and 816 species of spiders to assess the accuracy of genus and family-level assignment. We used BLAST queries of each sequence against the entire library and got the top ten hits. The percent sequence identity was reported from these hits (PIdent, range 75–100%). Accurate assignment of higher taxa (PIdent above which errors totaled less than 5%) occurred for genera at PIdent values >95 and families at PIdent values ≥ 91, suggesting these as heuristic thresholds for accurate generic and familial identifications in spiders. Accuracy of identification increases with numbers of species/genus and genera/family in the library; above five genera per family and fifteen species per genus all higher taxon assignments were correct. We propose that using percent sequence identity between conventional barcode sequences may be a feasible and reasonably accurate method to identify animals to family/genus. However, the quality of the underlying database impacts accuracy of results; many outliers in our dataset could be attributed to taxonomic and/or sequencing errors in BOLD and GenBank. It seems that an accurate and complete reference library of families and genera of life could provide accurate higher level taxonomic identifications cheaply and accessibly, within years rather than decades. PeerJ Inc. 2016-07-20 /pmc/articles/PMC4958005/ /pubmed/27547527 http://dx.doi.org/10.7717/peerj.2201 Text en ©2016 Coddington et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Biodiversity
Coddington, Jonathan A.
Agnarsson, Ingi
Cheng, Ren-Chung
Čandek, Klemen
Driskell, Amy
Frick, Holger
Gregorič, Matjaž
Kostanjšek, Rok
Kropf, Christian
Kweskin, Matthew
Lokovšek, Tjaša
Pipan, Miha
Vidergar, Nina
Kuntner, Matjaž
DNA barcode data accurately assign higher spider taxa
title DNA barcode data accurately assign higher spider taxa
title_full DNA barcode data accurately assign higher spider taxa
title_fullStr DNA barcode data accurately assign higher spider taxa
title_full_unstemmed DNA barcode data accurately assign higher spider taxa
title_short DNA barcode data accurately assign higher spider taxa
title_sort dna barcode data accurately assign higher spider taxa
topic Biodiversity
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4958005/
https://www.ncbi.nlm.nih.gov/pubmed/27547527
http://dx.doi.org/10.7717/peerj.2201
work_keys_str_mv AT coddingtonjonathana dnabarcodedataaccuratelyassignhigherspidertaxa
AT agnarssoningi dnabarcodedataaccuratelyassignhigherspidertaxa
AT chengrenchung dnabarcodedataaccuratelyassignhigherspidertaxa
AT candekklemen dnabarcodedataaccuratelyassignhigherspidertaxa
AT driskellamy dnabarcodedataaccuratelyassignhigherspidertaxa
AT frickholger dnabarcodedataaccuratelyassignhigherspidertaxa
AT gregoricmatjaz dnabarcodedataaccuratelyassignhigherspidertaxa
AT kostanjsekrok dnabarcodedataaccuratelyassignhigherspidertaxa
AT kropfchristian dnabarcodedataaccuratelyassignhigherspidertaxa
AT kweskinmatthew dnabarcodedataaccuratelyassignhigherspidertaxa
AT lokovsektjasa dnabarcodedataaccuratelyassignhigherspidertaxa
AT pipanmiha dnabarcodedataaccuratelyassignhigherspidertaxa
AT vidergarnina dnabarcodedataaccuratelyassignhigherspidertaxa
AT kuntnermatjaz dnabarcodedataaccuratelyassignhigherspidertaxa