Cargando…

Over 2.5 million COI sequences in GenBank and growing

The increasing popularity of cytochrome c oxidase subunit 1 (COI) DNA metabarcoding warrants a careful look at the underlying reference databases used to make high-throughput taxonomic assignments. The objectives of this study are to document trends and assess the future usability of COI records for...

Descripción completa

Detalles Bibliográficos
Autores principales:	Porter, Teresita M., Hajibabaei, Mehrdad
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2018
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6128447/ https://www.ncbi.nlm.nih.gov/pubmed/30192752 http://dx.doi.org/10.1371/journal.pone.0200177

_version_	1783353644039536640
author	Porter, Teresita M. Hajibabaei, Mehrdad
author_facet	Porter, Teresita M. Hajibabaei, Mehrdad
author_sort	Porter, Teresita M.
collection	PubMed
description	The increasing popularity of cytochrome c oxidase subunit 1 (COI) DNA metabarcoding warrants a careful look at the underlying reference databases used to make high-throughput taxonomic assignments. The objectives of this study are to document trends and assess the future usability of COI records for metabarcode identification. The number of COI records deposited to the NCBI nucleotide database has increased by a geometric average of 51% per year, from 8,137 records deposited in 2003 to a cumulative total of ~ 2.5 million by the end of 2017. About half of these records are fully identified to the species rank, 92% are at least 500 bp in length, 74% have a country annotation, and 51% have latitude-longitude annotations. To ensure the future usability of COI records in GenBank we suggest: 1) Improving the geographic representation of COI records, 2) Improving the cross-referencing of COI records in the Barcode of Life Data System and GenBank to facilitate consolidation and incorporation into existing bioinformatic pipelines, 3) Adherence to the minimum information about a marker gene sequence guidelines, and 4) Integrating metabarcodes from eDNA and mixed community studies with existing reference sequences. The growth of COI reference records over the past 15 years has been substantial and is likely to be a resource across many fields for years to come.
format	Online Article Text
id	pubmed-6128447
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-61284472018-09-15 Over 2.5 million COI sequences in GenBank and growing Porter, Teresita M. Hajibabaei, Mehrdad PLoS One Research Article The increasing popularity of cytochrome c oxidase subunit 1 (COI) DNA metabarcoding warrants a careful look at the underlying reference databases used to make high-throughput taxonomic assignments. The objectives of this study are to document trends and assess the future usability of COI records for metabarcode identification. The number of COI records deposited to the NCBI nucleotide database has increased by a geometric average of 51% per year, from 8,137 records deposited in 2003 to a cumulative total of ~ 2.5 million by the end of 2017. About half of these records are fully identified to the species rank, 92% are at least 500 bp in length, 74% have a country annotation, and 51% have latitude-longitude annotations. To ensure the future usability of COI records in GenBank we suggest: 1) Improving the geographic representation of COI records, 2) Improving the cross-referencing of COI records in the Barcode of Life Data System and GenBank to facilitate consolidation and incorporation into existing bioinformatic pipelines, 3) Adherence to the minimum information about a marker gene sequence guidelines, and 4) Integrating metabarcodes from eDNA and mixed community studies with existing reference sequences. The growth of COI reference records over the past 15 years has been substantial and is likely to be a resource across many fields for years to come. Public Library of Science 2018-09-07 /pmc/articles/PMC6128447/ /pubmed/30192752 http://dx.doi.org/10.1371/journal.pone.0200177 Text en © 2018 Porter, Hajibabaei http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Porter, Teresita M. Hajibabaei, Mehrdad Over 2.5 million COI sequences in GenBank and growing
title	Over 2.5 million COI sequences in GenBank and growing
title_full	Over 2.5 million COI sequences in GenBank and growing
title_fullStr	Over 2.5 million COI sequences in GenBank and growing
title_full_unstemmed	Over 2.5 million COI sequences in GenBank and growing
title_short	Over 2.5 million COI sequences in GenBank and growing
title_sort	over 2.5 million coi sequences in genbank and growing
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6128447/ https://www.ncbi.nlm.nih.gov/pubmed/30192752 http://dx.doi.org/10.1371/journal.pone.0200177
work_keys_str_mv	AT porterteresitam over25millioncoisequencesingenbankandgrowing AT hajibabaeimehrdad over25millioncoisequencesingenbankandgrowing

Over 2.5 million COI sequences in GenBank and growing

Ejemplares similares