Cargando…

A database of metazoan cytochrome c oxidase subunit I gene sequences derived from GenBank with CO-ARBitrator

The Cytochrome C Oxidase subunit I gene (“COI”) is the de facto standard for animal DNA barcoding. Organism identification based on COI requires an accurate and extensive annotated database of COI sequences. Such a database can also be of value in reconstructing evolutionary history and in diversity...

Descripción completa

Detalles Bibliográficos
Autores principales: Heller, Philip, Casaletto, James, Ruiz, Gregory, Geller, Jonathan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6080493/
https://www.ncbi.nlm.nih.gov/pubmed/30084847
http://dx.doi.org/10.1038/sdata.2018.156
_version_ 1783345487725723648
author Heller, Philip
Casaletto, James
Ruiz, Gregory
Geller, Jonathan
author_facet Heller, Philip
Casaletto, James
Ruiz, Gregory
Geller, Jonathan
author_sort Heller, Philip
collection PubMed
description The Cytochrome C Oxidase subunit I gene (“COI”) is the de facto standard for animal DNA barcoding. Organism identification based on COI requires an accurate and extensive annotated database of COI sequences. Such a database can also be of value in reconstructing evolutionary history and in diversity studies. Two COI databases are currently available: BOLD and Midori. BOLD’s submissions conform to stringent sequence and metadata requirements; BOLD is specific to COI but makes no attempt to be comprehensive. Midori, derived from GenBank, has more sequences but less stringent standards than BOLD, resulting in higher error rates. To address the need for a comprehensive and accurate COI database, we adapted the ARBitrator algorithm, which classifies based only on sequence properties and has successfully auto-curated bacterial genes mined from GenBank. The adapted algorithm, which we call CO-ARBitrator, built a database of over a million metazoan COI sequences. Sensitivity and specificity are significantly higher than Midori. Specificity is comparable to what BOLD achieves with data quality prerequisites. Results and software are publicly available.
format Online
Article
Text
id pubmed-6080493
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-60804932018-08-16 A database of metazoan cytochrome c oxidase subunit I gene sequences derived from GenBank with CO-ARBitrator Heller, Philip Casaletto, James Ruiz, Gregory Geller, Jonathan Sci Data Data Descriptor The Cytochrome C Oxidase subunit I gene (“COI”) is the de facto standard for animal DNA barcoding. Organism identification based on COI requires an accurate and extensive annotated database of COI sequences. Such a database can also be of value in reconstructing evolutionary history and in diversity studies. Two COI databases are currently available: BOLD and Midori. BOLD’s submissions conform to stringent sequence and metadata requirements; BOLD is specific to COI but makes no attempt to be comprehensive. Midori, derived from GenBank, has more sequences but less stringent standards than BOLD, resulting in higher error rates. To address the need for a comprehensive and accurate COI database, we adapted the ARBitrator algorithm, which classifies based only on sequence properties and has successfully auto-curated bacterial genes mined from GenBank. The adapted algorithm, which we call CO-ARBitrator, built a database of over a million metazoan COI sequences. Sensitivity and specificity are significantly higher than Midori. Specificity is comparable to what BOLD achieves with data quality prerequisites. Results and software are publicly available. Nature Publishing Group 2018-08-07 /pmc/articles/PMC6080493/ /pubmed/30084847 http://dx.doi.org/10.1038/sdata.2018.156 Text en Copyright © 2018, The Author(s) http://creativecommons.org/licenses/by/4.0/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files made available in this article.
spellingShingle Data Descriptor
Heller, Philip
Casaletto, James
Ruiz, Gregory
Geller, Jonathan
A database of metazoan cytochrome c oxidase subunit I gene sequences derived from GenBank with CO-ARBitrator
title A database of metazoan cytochrome c oxidase subunit I gene sequences derived from GenBank with CO-ARBitrator
title_full A database of metazoan cytochrome c oxidase subunit I gene sequences derived from GenBank with CO-ARBitrator
title_fullStr A database of metazoan cytochrome c oxidase subunit I gene sequences derived from GenBank with CO-ARBitrator
title_full_unstemmed A database of metazoan cytochrome c oxidase subunit I gene sequences derived from GenBank with CO-ARBitrator
title_short A database of metazoan cytochrome c oxidase subunit I gene sequences derived from GenBank with CO-ARBitrator
title_sort database of metazoan cytochrome c oxidase subunit i gene sequences derived from genbank with co-arbitrator
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6080493/
https://www.ncbi.nlm.nih.gov/pubmed/30084847
http://dx.doi.org/10.1038/sdata.2018.156
work_keys_str_mv AT hellerphilip adatabaseofmetazoancytochromecoxidasesubunitigenesequencesderivedfromgenbankwithcoarbitrator
AT casalettojames adatabaseofmetazoancytochromecoxidasesubunitigenesequencesderivedfromgenbankwithcoarbitrator
AT ruizgregory adatabaseofmetazoancytochromecoxidasesubunitigenesequencesderivedfromgenbankwithcoarbitrator
AT gellerjonathan adatabaseofmetazoancytochromecoxidasesubunitigenesequencesderivedfromgenbankwithcoarbitrator
AT hellerphilip databaseofmetazoancytochromecoxidasesubunitigenesequencesderivedfromgenbankwithcoarbitrator
AT casalettojames databaseofmetazoancytochromecoxidasesubunitigenesequencesderivedfromgenbankwithcoarbitrator
AT ruizgregory databaseofmetazoancytochromecoxidasesubunitigenesequencesderivedfromgenbankwithcoarbitrator
AT gellerjonathan databaseofmetazoancytochromecoxidasesubunitigenesequencesderivedfromgenbankwithcoarbitrator