Cargando…

Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities

BACKGROUND: Microbiome studies commonly use 16S rRNA gene amplicon sequencing to characterize microbial communities. Errors introduced at multiple steps in this process can affect the interpretation of the data. Here we evaluate the accuracy of operational taxonomic unit (OTU) generation, taxonomic...

Descripción completa

Detalles Bibliográficos
Autores principales:	Golob, Jonathan L., Margolis, Elisa, Hoffman, Noah G., Fredricks, David N.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5450146/ https://www.ncbi.nlm.nih.gov/pubmed/28558684 http://dx.doi.org/10.1186/s12859-017-1690-0

_version_	1783239906517057536
author	Golob, Jonathan L. Margolis, Elisa Hoffman, Noah G. Fredricks, David N.
author_facet	Golob, Jonathan L. Margolis, Elisa Hoffman, Noah G. Fredricks, David N.
author_sort	Golob, Jonathan L.
collection	PubMed
description	BACKGROUND: Microbiome studies commonly use 16S rRNA gene amplicon sequencing to characterize microbial communities. Errors introduced at multiple steps in this process can affect the interpretation of the data. Here we evaluate the accuracy of operational taxonomic unit (OTU) generation, taxonomic classification, alpha- and beta-diversity measures for different settings in QIIME, MOTHUR and a pplacer-based classification pipeline, using a novel software package: DECARD. RESULTS: In-silico we generated 100 synthetic bacterial communities approximating human stool microbiomes to be used as a gold-standard for evaluating the colligative performance of microbiome analysis software. Our synthetic data closely matched the composition and complexity of actual healthy human stool microbiomes. Genus-level taxonomic classification was correctly done for only 50.4–74.8% of the source organisms. Miscall rates varied from 11.9 to 23.5%. Species-level classification was less successful, (6.9–18.9% correct); miscall rates were comparable to those of genus-level targets (12.5–26.2%). The degree of miscall varied by clade of organism, pipeline and specific settings used. OTU generation accuracy varied by strategy (closed, de novo or subsampling), reference database, algorithm and software implementation. Shannon diversity estimation accuracy correlated generally with OTU-generation accuracy. Beta-diversity estimates with Double Principle Coordinate Analysis (DPCoA) were more robust against errors introduced in processing than Weighted UniFrac. The settings suggested in the tutorials were among the worst performing in all outcomes tested. CONCLUSIONS: Even when using the same classification pipeline, the specific OTU-generation strategy, reference database and downstream analysis methods selection can have a dramatic effect on the accuracy of taxonomic classification, and alpha- and beta-diversity estimation. Even minor changes in settings adversely affected the accuracy of the results, bringing them far from the best-observed result. Thus, specific details of how a pipeline is used (including OTU generation strategy, reference sets, clustering algorithm and specific software implementation) should be specified in the methods section of all microbiome studies. Researchers should evaluate their chosen pipeline and settings to confirm it can adequately answer the research question rather than assuming the tutorial or standard-operating-procedure settings will be adequate or optimal. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1690-0) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5450146
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-54501462017-06-01 Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities Golob, Jonathan L. Margolis, Elisa Hoffman, Noah G. Fredricks, David N. BMC Bioinformatics Research Article BACKGROUND: Microbiome studies commonly use 16S rRNA gene amplicon sequencing to characterize microbial communities. Errors introduced at multiple steps in this process can affect the interpretation of the data. Here we evaluate the accuracy of operational taxonomic unit (OTU) generation, taxonomic classification, alpha- and beta-diversity measures for different settings in QIIME, MOTHUR and a pplacer-based classification pipeline, using a novel software package: DECARD. RESULTS: In-silico we generated 100 synthetic bacterial communities approximating human stool microbiomes to be used as a gold-standard for evaluating the colligative performance of microbiome analysis software. Our synthetic data closely matched the composition and complexity of actual healthy human stool microbiomes. Genus-level taxonomic classification was correctly done for only 50.4–74.8% of the source organisms. Miscall rates varied from 11.9 to 23.5%. Species-level classification was less successful, (6.9–18.9% correct); miscall rates were comparable to those of genus-level targets (12.5–26.2%). The degree of miscall varied by clade of organism, pipeline and specific settings used. OTU generation accuracy varied by strategy (closed, de novo or subsampling), reference database, algorithm and software implementation. Shannon diversity estimation accuracy correlated generally with OTU-generation accuracy. Beta-diversity estimates with Double Principle Coordinate Analysis (DPCoA) were more robust against errors introduced in processing than Weighted UniFrac. The settings suggested in the tutorials were among the worst performing in all outcomes tested. CONCLUSIONS: Even when using the same classification pipeline, the specific OTU-generation strategy, reference database and downstream analysis methods selection can have a dramatic effect on the accuracy of taxonomic classification, and alpha- and beta-diversity estimation. Even minor changes in settings adversely affected the accuracy of the results, bringing them far from the best-observed result. Thus, specific details of how a pipeline is used (including OTU generation strategy, reference sets, clustering algorithm and specific software implementation) should be specified in the methods section of all microbiome studies. Researchers should evaluate their chosen pipeline and settings to confirm it can adequately answer the research question rather than assuming the tutorial or standard-operating-procedure settings will be adequate or optimal. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1690-0) contains supplementary material, which is available to authorized users. BioMed Central 2017-05-30 /pmc/articles/PMC5450146/ /pubmed/28558684 http://dx.doi.org/10.1186/s12859-017-1690-0 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Golob, Jonathan L. Margolis, Elisa Hoffman, Noah G. Fredricks, David N. Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities
title	Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities
title_full	Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities
title_fullStr	Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities
title_full_unstemmed	Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities
title_short	Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities
title_sort	evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5450146/ https://www.ncbi.nlm.nih.gov/pubmed/28558684 http://dx.doi.org/10.1186/s12859-017-1690-0
work_keys_str_mv	AT golobjonathanl evaluatingtheaccuracyofampliconbasedmicrobiomecomputationalpipelinesonsimulatedhumangutmicrobialcommunities AT margoliselisa evaluatingtheaccuracyofampliconbasedmicrobiomecomputationalpipelinesonsimulatedhumangutmicrobialcommunities AT hoffmannoahg evaluatingtheaccuracyofampliconbasedmicrobiomecomputationalpipelinesonsimulatedhumangutmicrobialcommunities AT fredricksdavidn evaluatingtheaccuracyofampliconbasedmicrobiomecomputationalpipelinesonsimulatedhumangutmicrobialcommunities

Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities

Ejemplares similares