Cargando…
Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin
BACKGROUND: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. RESULTS: We present q2-feature-classifier (https://github.com/qiime2/q2-feature-classifier), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5956843/ https://www.ncbi.nlm.nih.gov/pubmed/29773078 http://dx.doi.org/10.1186/s40168-018-0470-z |
_version_ | 1783323962918305792 |
---|---|
author | Bokulich, Nicholas A. Kaehler, Benjamin D. Rideout, Jai Ram Dillon, Matthew Bolyen, Evan Knight, Rob Huttley, Gavin A. Gregory Caporaso, J. |
author_facet | Bokulich, Nicholas A. Kaehler, Benjamin D. Rideout, Jai Ram Dillon, Matthew Bolyen, Evan Knight, Rob Huttley, Gavin A. Gregory Caporaso, J. |
author_sort | Bokulich, Nicholas A. |
collection | PubMed |
description | BACKGROUND: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. RESULTS: We present q2-feature-classifier (https://github.com/qiime2/q2-feature-classifier), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated “novel” marker-gene sequences, are available in our extensible benchmarking framework, tax-credit (https://github.com/caporaso-lab/tax-credit-data). CONCLUSIONS: Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub. |
format | Online Article Text |
id | pubmed-5956843 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-59568432018-05-24 Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin Bokulich, Nicholas A. Kaehler, Benjamin D. Rideout, Jai Ram Dillon, Matthew Bolyen, Evan Knight, Rob Huttley, Gavin A. Gregory Caporaso, J. Microbiome Research BACKGROUND: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. RESULTS: We present q2-feature-classifier (https://github.com/qiime2/q2-feature-classifier), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated “novel” marker-gene sequences, are available in our extensible benchmarking framework, tax-credit (https://github.com/caporaso-lab/tax-credit-data). CONCLUSIONS: Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub. BioMed Central 2018-05-17 /pmc/articles/PMC5956843/ /pubmed/29773078 http://dx.doi.org/10.1186/s40168-018-0470-z Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Bokulich, Nicholas A. Kaehler, Benjamin D. Rideout, Jai Ram Dillon, Matthew Bolyen, Evan Knight, Rob Huttley, Gavin A. Gregory Caporaso, J. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin |
title | Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin |
title_full | Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin |
title_fullStr | Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin |
title_full_unstemmed | Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin |
title_short | Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin |
title_sort | optimizing taxonomic classification of marker-gene amplicon sequences with qiime 2’s q2-feature-classifier plugin |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5956843/ https://www.ncbi.nlm.nih.gov/pubmed/29773078 http://dx.doi.org/10.1186/s40168-018-0470-z |
work_keys_str_mv | AT bokulichnicholasa optimizingtaxonomicclassificationofmarkergeneampliconsequenceswithqiime2sq2featureclassifierplugin AT kaehlerbenjamind optimizingtaxonomicclassificationofmarkergeneampliconsequenceswithqiime2sq2featureclassifierplugin AT rideoutjairam optimizingtaxonomicclassificationofmarkergeneampliconsequenceswithqiime2sq2featureclassifierplugin AT dillonmatthew optimizingtaxonomicclassificationofmarkergeneampliconsequenceswithqiime2sq2featureclassifierplugin AT bolyenevan optimizingtaxonomicclassificationofmarkergeneampliconsequenceswithqiime2sq2featureclassifierplugin AT knightrob optimizingtaxonomicclassificationofmarkergeneampliconsequenceswithqiime2sq2featureclassifierplugin AT huttleygavina optimizingtaxonomicclassificationofmarkergeneampliconsequenceswithqiime2sq2featureclassifierplugin AT gregorycaporasoj optimizingtaxonomicclassificationofmarkergeneampliconsequenceswithqiime2sq2featureclassifierplugin |