Cargando…
A database of metazoan cytochrome c oxidase subunit I gene sequences derived from GenBank with CO-ARBitrator
The Cytochrome C Oxidase subunit I gene (“COI”) is the de facto standard for animal DNA barcoding. Organism identification based on COI requires an accurate and extensive annotated database of COI sequences. Such a database can also be of value in reconstructing evolutionary history and in diversity...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6080493/ https://www.ncbi.nlm.nih.gov/pubmed/30084847 http://dx.doi.org/10.1038/sdata.2018.156 |
_version_ | 1783345487725723648 |
---|---|
author | Heller, Philip Casaletto, James Ruiz, Gregory Geller, Jonathan |
author_facet | Heller, Philip Casaletto, James Ruiz, Gregory Geller, Jonathan |
author_sort | Heller, Philip |
collection | PubMed |
description | The Cytochrome C Oxidase subunit I gene (“COI”) is the de facto standard for animal DNA barcoding. Organism identification based on COI requires an accurate and extensive annotated database of COI sequences. Such a database can also be of value in reconstructing evolutionary history and in diversity studies. Two COI databases are currently available: BOLD and Midori. BOLD’s submissions conform to stringent sequence and metadata requirements; BOLD is specific to COI but makes no attempt to be comprehensive. Midori, derived from GenBank, has more sequences but less stringent standards than BOLD, resulting in higher error rates. To address the need for a comprehensive and accurate COI database, we adapted the ARBitrator algorithm, which classifies based only on sequence properties and has successfully auto-curated bacterial genes mined from GenBank. The adapted algorithm, which we call CO-ARBitrator, built a database of over a million metazoan COI sequences. Sensitivity and specificity are significantly higher than Midori. Specificity is comparable to what BOLD achieves with data quality prerequisites. Results and software are publicly available. |
format | Online Article Text |
id | pubmed-6080493 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-60804932018-08-16 A database of metazoan cytochrome c oxidase subunit I gene sequences derived from GenBank with CO-ARBitrator Heller, Philip Casaletto, James Ruiz, Gregory Geller, Jonathan Sci Data Data Descriptor The Cytochrome C Oxidase subunit I gene (“COI”) is the de facto standard for animal DNA barcoding. Organism identification based on COI requires an accurate and extensive annotated database of COI sequences. Such a database can also be of value in reconstructing evolutionary history and in diversity studies. Two COI databases are currently available: BOLD and Midori. BOLD’s submissions conform to stringent sequence and metadata requirements; BOLD is specific to COI but makes no attempt to be comprehensive. Midori, derived from GenBank, has more sequences but less stringent standards than BOLD, resulting in higher error rates. To address the need for a comprehensive and accurate COI database, we adapted the ARBitrator algorithm, which classifies based only on sequence properties and has successfully auto-curated bacterial genes mined from GenBank. The adapted algorithm, which we call CO-ARBitrator, built a database of over a million metazoan COI sequences. Sensitivity and specificity are significantly higher than Midori. Specificity is comparable to what BOLD achieves with data quality prerequisites. Results and software are publicly available. Nature Publishing Group 2018-08-07 /pmc/articles/PMC6080493/ /pubmed/30084847 http://dx.doi.org/10.1038/sdata.2018.156 Text en Copyright © 2018, The Author(s) http://creativecommons.org/licenses/by/4.0/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files made available in this article. |
spellingShingle | Data Descriptor Heller, Philip Casaletto, James Ruiz, Gregory Geller, Jonathan A database of metazoan cytochrome c oxidase subunit I gene sequences derived from GenBank with CO-ARBitrator |
title | A database of metazoan cytochrome c oxidase subunit I gene sequences derived from GenBank with CO-ARBitrator |
title_full | A database of metazoan cytochrome c oxidase subunit I gene sequences derived from GenBank with CO-ARBitrator |
title_fullStr | A database of metazoan cytochrome c oxidase subunit I gene sequences derived from GenBank with CO-ARBitrator |
title_full_unstemmed | A database of metazoan cytochrome c oxidase subunit I gene sequences derived from GenBank with CO-ARBitrator |
title_short | A database of metazoan cytochrome c oxidase subunit I gene sequences derived from GenBank with CO-ARBitrator |
title_sort | database of metazoan cytochrome c oxidase subunit i gene sequences derived from genbank with co-arbitrator |
topic | Data Descriptor |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6080493/ https://www.ncbi.nlm.nih.gov/pubmed/30084847 http://dx.doi.org/10.1038/sdata.2018.156 |
work_keys_str_mv | AT hellerphilip adatabaseofmetazoancytochromecoxidasesubunitigenesequencesderivedfromgenbankwithcoarbitrator AT casalettojames adatabaseofmetazoancytochromecoxidasesubunitigenesequencesderivedfromgenbankwithcoarbitrator AT ruizgregory adatabaseofmetazoancytochromecoxidasesubunitigenesequencesderivedfromgenbankwithcoarbitrator AT gellerjonathan adatabaseofmetazoancytochromecoxidasesubunitigenesequencesderivedfromgenbankwithcoarbitrator AT hellerphilip databaseofmetazoancytochromecoxidasesubunitigenesequencesderivedfromgenbankwithcoarbitrator AT casalettojames databaseofmetazoancytochromecoxidasesubunitigenesequencesderivedfromgenbankwithcoarbitrator AT ruizgregory databaseofmetazoancytochromecoxidasesubunitigenesequencesderivedfromgenbankwithcoarbitrator AT gellerjonathan databaseofmetazoancytochromecoxidasesubunitigenesequencesderivedfromgenbankwithcoarbitrator |