Cargando…

Species abundance information improves sequence taxonomy classification accuracy

Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always viola...

Descripción completa

Detalles Bibliográficos
Autores principales: Kaehler, Benjamin D., Bokulich, Nicholas A., McDonald, Daniel, Knight, Rob, Caporaso, J. Gregory, Huttley, Gavin A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6789115/
https://www.ncbi.nlm.nih.gov/pubmed/31604942
http://dx.doi.org/10.1038/s41467-019-12669-6
_version_ 1783458580209336320
author Kaehler, Benjamin D.
Bokulich, Nicholas A.
McDonald, Daniel
Knight, Rob
Caporaso, J. Gregory
Huttley, Gavin A.
author_facet Kaehler, Benjamin D.
Bokulich, Nicholas A.
McDonald, Daniel
Knight, Rob
Caporaso, J. Gregory
Huttley, Gavin A.
author_sort Kaehler, Benjamin D.
collection PubMed
description Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments.
format Online
Article
Text
id pubmed-6789115
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-67891152019-10-15 Species abundance information improves sequence taxonomy classification accuracy Kaehler, Benjamin D. Bokulich, Nicholas A. McDonald, Daniel Knight, Rob Caporaso, J. Gregory Huttley, Gavin A. Nat Commun Article Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments. Nature Publishing Group UK 2019-10-11 /pmc/articles/PMC6789115/ /pubmed/31604942 http://dx.doi.org/10.1038/s41467-019-12669-6 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Kaehler, Benjamin D.
Bokulich, Nicholas A.
McDonald, Daniel
Knight, Rob
Caporaso, J. Gregory
Huttley, Gavin A.
Species abundance information improves sequence taxonomy classification accuracy
title Species abundance information improves sequence taxonomy classification accuracy
title_full Species abundance information improves sequence taxonomy classification accuracy
title_fullStr Species abundance information improves sequence taxonomy classification accuracy
title_full_unstemmed Species abundance information improves sequence taxonomy classification accuracy
title_short Species abundance information improves sequence taxonomy classification accuracy
title_sort species abundance information improves sequence taxonomy classification accuracy
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6789115/
https://www.ncbi.nlm.nih.gov/pubmed/31604942
http://dx.doi.org/10.1038/s41467-019-12669-6
work_keys_str_mv AT kaehlerbenjamind speciesabundanceinformationimprovessequencetaxonomyclassificationaccuracy
AT bokulichnicholasa speciesabundanceinformationimprovessequencetaxonomyclassificationaccuracy
AT mcdonalddaniel speciesabundanceinformationimprovessequencetaxonomyclassificationaccuracy
AT knightrob speciesabundanceinformationimprovessequencetaxonomyclassificationaccuracy
AT caporasojgregory speciesabundanceinformationimprovessequencetaxonomyclassificationaccuracy
AT huttleygavina speciesabundanceinformationimprovessequencetaxonomyclassificationaccuracy