Cargando…
Species abundance information improves sequence taxonomy classification accuracy
Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always viola...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6789115/ https://www.ncbi.nlm.nih.gov/pubmed/31604942 http://dx.doi.org/10.1038/s41467-019-12669-6 |
_version_ | 1783458580209336320 |
---|---|
author | Kaehler, Benjamin D. Bokulich, Nicholas A. McDonald, Daniel Knight, Rob Caporaso, J. Gregory Huttley, Gavin A. |
author_facet | Kaehler, Benjamin D. Bokulich, Nicholas A. McDonald, Daniel Knight, Rob Caporaso, J. Gregory Huttley, Gavin A. |
author_sort | Kaehler, Benjamin D. |
collection | PubMed |
description | Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments. |
format | Online Article Text |
id | pubmed-6789115 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-67891152019-10-15 Species abundance information improves sequence taxonomy classification accuracy Kaehler, Benjamin D. Bokulich, Nicholas A. McDonald, Daniel Knight, Rob Caporaso, J. Gregory Huttley, Gavin A. Nat Commun Article Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments. Nature Publishing Group UK 2019-10-11 /pmc/articles/PMC6789115/ /pubmed/31604942 http://dx.doi.org/10.1038/s41467-019-12669-6 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Kaehler, Benjamin D. Bokulich, Nicholas A. McDonald, Daniel Knight, Rob Caporaso, J. Gregory Huttley, Gavin A. Species abundance information improves sequence taxonomy classification accuracy |
title | Species abundance information improves sequence taxonomy classification accuracy |
title_full | Species abundance information improves sequence taxonomy classification accuracy |
title_fullStr | Species abundance information improves sequence taxonomy classification accuracy |
title_full_unstemmed | Species abundance information improves sequence taxonomy classification accuracy |
title_short | Species abundance information improves sequence taxonomy classification accuracy |
title_sort | species abundance information improves sequence taxonomy classification accuracy |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6789115/ https://www.ncbi.nlm.nih.gov/pubmed/31604942 http://dx.doi.org/10.1038/s41467-019-12669-6 |
work_keys_str_mv | AT kaehlerbenjamind speciesabundanceinformationimprovessequencetaxonomyclassificationaccuracy AT bokulichnicholasa speciesabundanceinformationimprovessequencetaxonomyclassificationaccuracy AT mcdonalddaniel speciesabundanceinformationimprovessequencetaxonomyclassificationaccuracy AT knightrob speciesabundanceinformationimprovessequencetaxonomyclassificationaccuracy AT caporasojgregory speciesabundanceinformationimprovessequencetaxonomyclassificationaccuracy AT huttleygavina speciesabundanceinformationimprovessequencetaxonomyclassificationaccuracy |