Cargando…

GenBank is a reliable resource for 21st century biodiversity research

Traditional methods of characterizing biodiversity are increasingly being supplemented and replaced by approaches based on DNA sequencing alone. These approaches commonly involve extraction and high-throughput sequencing of bulk samples from biologically complex communities or samples of environment...

Descripción completa

Detalles Bibliográficos
Autores principales: Leray, Matthieu, Knowlton, Nancy, Ho, Shian-Lei, Nguyen, Bryan N., Machida, Ryuji J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6842603/
https://www.ncbi.nlm.nih.gov/pubmed/31636175
http://dx.doi.org/10.1073/pnas.1911714116
_version_ 1783468071511392256
author Leray, Matthieu
Knowlton, Nancy
Ho, Shian-Lei
Nguyen, Bryan N.
Machida, Ryuji J.
author_facet Leray, Matthieu
Knowlton, Nancy
Ho, Shian-Lei
Nguyen, Bryan N.
Machida, Ryuji J.
author_sort Leray, Matthieu
collection PubMed
description Traditional methods of characterizing biodiversity are increasingly being supplemented and replaced by approaches based on DNA sequencing alone. These approaches commonly involve extraction and high-throughput sequencing of bulk samples from biologically complex communities or samples of environmental DNA (eDNA). In such cases, vouchers for individual organisms are rarely obtained, often unidentifiable, or unavailable. Thus, identifying these sequences typically relies on comparisons with sequences from genetic databases, particularly GenBank. While concerns have been raised about biases and inaccuracies in laboratory and analytical methods, comparatively little attention has been paid to the taxonomic reliability of GenBank itself. Here we analyze the metazoan mitochondrial sequences of GenBank using a combination of distance-based clustering and phylogenetic analysis. Because of their comparatively rapid evolutionary rates and consequent high taxonomic resolution, mitochondrial sequences represent an invaluable resource for the detection of the many small and often undescribed organisms that represent the bulk of animal diversity. We show that metazoan identifications in GenBank are surprisingly accurate, even at low taxonomic levels (likely <1% error rate at the genus level). This stands in contrast to previously voiced concerns based on limited analyses of particular groups and the fact that individual researchers currently submit annotated sequences to GenBank without significant external taxonomic validation. Our encouraging results suggest that the rapid uptake of DNA-based approaches is supported by a bioinformatic infrastructure capable of assessing both the losses to biodiversity caused by global change and the effectiveness of conservation efforts aimed at slowing or reversing these losses.
format Online
Article
Text
id pubmed-6842603
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-68426032019-11-15 GenBank is a reliable resource for 21st century biodiversity research Leray, Matthieu Knowlton, Nancy Ho, Shian-Lei Nguyen, Bryan N. Machida, Ryuji J. Proc Natl Acad Sci U S A Biological Sciences Traditional methods of characterizing biodiversity are increasingly being supplemented and replaced by approaches based on DNA sequencing alone. These approaches commonly involve extraction and high-throughput sequencing of bulk samples from biologically complex communities or samples of environmental DNA (eDNA). In such cases, vouchers for individual organisms are rarely obtained, often unidentifiable, or unavailable. Thus, identifying these sequences typically relies on comparisons with sequences from genetic databases, particularly GenBank. While concerns have been raised about biases and inaccuracies in laboratory and analytical methods, comparatively little attention has been paid to the taxonomic reliability of GenBank itself. Here we analyze the metazoan mitochondrial sequences of GenBank using a combination of distance-based clustering and phylogenetic analysis. Because of their comparatively rapid evolutionary rates and consequent high taxonomic resolution, mitochondrial sequences represent an invaluable resource for the detection of the many small and often undescribed organisms that represent the bulk of animal diversity. We show that metazoan identifications in GenBank are surprisingly accurate, even at low taxonomic levels (likely <1% error rate at the genus level). This stands in contrast to previously voiced concerns based on limited analyses of particular groups and the fact that individual researchers currently submit annotated sequences to GenBank without significant external taxonomic validation. Our encouraging results suggest that the rapid uptake of DNA-based approaches is supported by a bioinformatic infrastructure capable of assessing both the losses to biodiversity caused by global change and the effectiveness of conservation efforts aimed at slowing or reversing these losses. National Academy of Sciences 2019-11-05 2019-10-21 /pmc/articles/PMC6842603/ /pubmed/31636175 http://dx.doi.org/10.1073/pnas.1911714116 Text en Copyright © 2019 the Author(s). Published by PNAS. http://creativecommons.org/licenses/by/4.0/ https://creativecommons.org/licenses/by/4.0/This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY) (http://creativecommons.org/licenses/by/4.0/) .
spellingShingle Biological Sciences
Leray, Matthieu
Knowlton, Nancy
Ho, Shian-Lei
Nguyen, Bryan N.
Machida, Ryuji J.
GenBank is a reliable resource for 21st century biodiversity research
title GenBank is a reliable resource for 21st century biodiversity research
title_full GenBank is a reliable resource for 21st century biodiversity research
title_fullStr GenBank is a reliable resource for 21st century biodiversity research
title_full_unstemmed GenBank is a reliable resource for 21st century biodiversity research
title_short GenBank is a reliable resource for 21st century biodiversity research
title_sort genbank is a reliable resource for 21st century biodiversity research
topic Biological Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6842603/
https://www.ncbi.nlm.nih.gov/pubmed/31636175
http://dx.doi.org/10.1073/pnas.1911714116
work_keys_str_mv AT leraymatthieu genbankisareliableresourcefor21stcenturybiodiversityresearch
AT knowltonnancy genbankisareliableresourcefor21stcenturybiodiversityresearch
AT hoshianlei genbankisareliableresourcefor21stcenturybiodiversityresearch
AT nguyenbryann genbankisareliableresourcefor21stcenturybiodiversityresearch
AT machidaryujij genbankisareliableresourcefor21stcenturybiodiversityresearch