Cargando…

Taxonomy annotation and guide tree errors in 16S rRNA databases

Sequencing of the 16S ribosomal RNA (rRNA) gene is widely used to survey microbial communities. Specialized 16S rRNA databases have been developed to support this approach including Greengenes, RDP and SILVA. Most taxonomy annotations in these databases are predictions from sequence rather than auth...

Descripción completa

Detalles Bibliográficos
Autor principal: Edgar, Robert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6003391/
https://www.ncbi.nlm.nih.gov/pubmed/29910992
http://dx.doi.org/10.7717/peerj.5030
_version_ 1783332357945688064
author Edgar, Robert
author_facet Edgar, Robert
author_sort Edgar, Robert
collection PubMed
description Sequencing of the 16S ribosomal RNA (rRNA) gene is widely used to survey microbial communities. Specialized 16S rRNA databases have been developed to support this approach including Greengenes, RDP and SILVA. Most taxonomy annotations in these databases are predictions from sequence rather than authoritative assignments based on studies of type strains or isolates. In this work, I investigated the taxonomy annotations and guide trees provided by these databases. Using a blinded test, I estimated that the annotation error rate of the RDP database is ∼10%. The branching orders of the Greengenes and SILVA guide trees were found to disagree at comparable rates with each other and with taxonomy annotations according to the training set (authoritative reference) provided by RDP, indicating that the trees have comparable quality. Pervasive conflicts between tree branching order and type strain taxonomies strongly suggest that the guide trees are unreliable guides to phylogeny. I found 249,490 identical sequences with conflicting annotations in SILVA v128 and Greengenes v13.5 at ranks up to phylum (7,804 conflicts), indicating that the annotation error rate in these databases is ∼17%.
format Online
Article
Text
id pubmed-6003391
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-60033912018-06-15 Taxonomy annotation and guide tree errors in 16S rRNA databases Edgar, Robert PeerJ Bioinformatics Sequencing of the 16S ribosomal RNA (rRNA) gene is widely used to survey microbial communities. Specialized 16S rRNA databases have been developed to support this approach including Greengenes, RDP and SILVA. Most taxonomy annotations in these databases are predictions from sequence rather than authoritative assignments based on studies of type strains or isolates. In this work, I investigated the taxonomy annotations and guide trees provided by these databases. Using a blinded test, I estimated that the annotation error rate of the RDP database is ∼10%. The branching orders of the Greengenes and SILVA guide trees were found to disagree at comparable rates with each other and with taxonomy annotations according to the training set (authoritative reference) provided by RDP, indicating that the trees have comparable quality. Pervasive conflicts between tree branching order and type strain taxonomies strongly suggest that the guide trees are unreliable guides to phylogeny. I found 249,490 identical sequences with conflicting annotations in SILVA v128 and Greengenes v13.5 at ranks up to phylum (7,804 conflicts), indicating that the annotation error rate in these databases is ∼17%. PeerJ Inc. 2018-06-12 /pmc/articles/PMC6003391/ /pubmed/29910992 http://dx.doi.org/10.7717/peerj.5030 Text en © 2018 Edgar http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Edgar, Robert
Taxonomy annotation and guide tree errors in 16S rRNA databases
title Taxonomy annotation and guide tree errors in 16S rRNA databases
title_full Taxonomy annotation and guide tree errors in 16S rRNA databases
title_fullStr Taxonomy annotation and guide tree errors in 16S rRNA databases
title_full_unstemmed Taxonomy annotation and guide tree errors in 16S rRNA databases
title_short Taxonomy annotation and guide tree errors in 16S rRNA databases
title_sort taxonomy annotation and guide tree errors in 16s rrna databases
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6003391/
https://www.ncbi.nlm.nih.gov/pubmed/29910992
http://dx.doi.org/10.7717/peerj.5030
work_keys_str_mv AT edgarrobert taxonomyannotationandguidetreeerrorsin16srrnadatabases