Cargando…

Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline

One of the major methods to identify microbial community composition, to unravel microbial population dynamics, and to explore microbial diversity in environmental samples is high-throughput DNA- or RNA-based 16S rRNA (gene) amplicon sequencing in combination with bioinformatics analyses. However, f...

Descripción completa

Detalles Bibliográficos
Autores principales: Straub, Daniel, Blackwell, Nia, Langarica-Fuentes, Adrian, Peltzer, Alexander, Nahnsen, Sven, Kleindienst, Sara
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7645116/
https://www.ncbi.nlm.nih.gov/pubmed/33193131
http://dx.doi.org/10.3389/fmicb.2020.550420
_version_ 1783606600022360064
author Straub, Daniel
Blackwell, Nia
Langarica-Fuentes, Adrian
Peltzer, Alexander
Nahnsen, Sven
Kleindienst, Sara
author_facet Straub, Daniel
Blackwell, Nia
Langarica-Fuentes, Adrian
Peltzer, Alexander
Nahnsen, Sven
Kleindienst, Sara
author_sort Straub, Daniel
collection PubMed
description One of the major methods to identify microbial community composition, to unravel microbial population dynamics, and to explore microbial diversity in environmental samples is high-throughput DNA- or RNA-based 16S rRNA (gene) amplicon sequencing in combination with bioinformatics analyses. However, focusing on environmental samples from contrasting habitats, it was not systematically evaluated (i) which analysis methods provide results that reflect reality most accurately, (ii) how the interpretations of microbial community studies are biased by different analysis methods and (iii) if the most optimal analysis workflow can be implemented in an easy-to-use pipeline. Here, we compared the performance of 16S rRNA (gene) amplicon sequencing analysis tools (i.e., Mothur, QIIME1, QIIME2, and MEGAN) using three mock datasets with known microbial community composition that differed in sequencing quality, species number and abundance distribution (i.e., even or uneven), and phylogenetic diversity (i.e., closely related or well-separated amplicon sequences). Our results showed that QIIME2 outcompeted all other investigated tools in sequence recovery (>10 times fewer false positives), taxonomic assignments (>22% better F-score) and diversity estimates (>5% better assessment), suggesting that this approach is able to reflect the in situ microbial community most accurately. Further analysis of 24 environmental datasets obtained from four contrasting terrestrial and freshwater sites revealed dramatic differences in the resulting microbial community composition for all pipelines at genus level. For instance, at the investigated river water sites Sphaerotilus was only reported when using QIIME1 (8% abundance) and Agitococcus with QIIME1 or QIIME2 (2 or 3% abundance, respectively), but both genera remained undetected when analyzed with Mothur or MEGAN. Since these abundant taxa probably have implications for important biogeochemical cycles (e.g., nitrate and sulfate reduction) at these sites, their detection and semi-quantitative enumeration is crucial for valid interpretations. A high-performance computing conformant workflow was constructed to allow FAIR (Findable, Accessible, Interoperable, and Re-usable) 16S rRNA (gene) amplicon sequence analysis starting from raw sequence files, using the most optimal methods identified in our study. Our presented workflow should be considered for future studies, thereby facilitating the analysis of high-throughput 16S rRNA (gene) sequencing data substantially, while maximizing reliability and confidence in microbial community data analysis.
format Online
Article
Text
id pubmed-7645116
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-76451162020-11-13 Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline Straub, Daniel Blackwell, Nia Langarica-Fuentes, Adrian Peltzer, Alexander Nahnsen, Sven Kleindienst, Sara Front Microbiol Microbiology One of the major methods to identify microbial community composition, to unravel microbial population dynamics, and to explore microbial diversity in environmental samples is high-throughput DNA- or RNA-based 16S rRNA (gene) amplicon sequencing in combination with bioinformatics analyses. However, focusing on environmental samples from contrasting habitats, it was not systematically evaluated (i) which analysis methods provide results that reflect reality most accurately, (ii) how the interpretations of microbial community studies are biased by different analysis methods and (iii) if the most optimal analysis workflow can be implemented in an easy-to-use pipeline. Here, we compared the performance of 16S rRNA (gene) amplicon sequencing analysis tools (i.e., Mothur, QIIME1, QIIME2, and MEGAN) using three mock datasets with known microbial community composition that differed in sequencing quality, species number and abundance distribution (i.e., even or uneven), and phylogenetic diversity (i.e., closely related or well-separated amplicon sequences). Our results showed that QIIME2 outcompeted all other investigated tools in sequence recovery (>10 times fewer false positives), taxonomic assignments (>22% better F-score) and diversity estimates (>5% better assessment), suggesting that this approach is able to reflect the in situ microbial community most accurately. Further analysis of 24 environmental datasets obtained from four contrasting terrestrial and freshwater sites revealed dramatic differences in the resulting microbial community composition for all pipelines at genus level. For instance, at the investigated river water sites Sphaerotilus was only reported when using QIIME1 (8% abundance) and Agitococcus with QIIME1 or QIIME2 (2 or 3% abundance, respectively), but both genera remained undetected when analyzed with Mothur or MEGAN. Since these abundant taxa probably have implications for important biogeochemical cycles (e.g., nitrate and sulfate reduction) at these sites, their detection and semi-quantitative enumeration is crucial for valid interpretations. A high-performance computing conformant workflow was constructed to allow FAIR (Findable, Accessible, Interoperable, and Re-usable) 16S rRNA (gene) amplicon sequence analysis starting from raw sequence files, using the most optimal methods identified in our study. Our presented workflow should be considered for future studies, thereby facilitating the analysis of high-throughput 16S rRNA (gene) sequencing data substantially, while maximizing reliability and confidence in microbial community data analysis. Frontiers Media S.A. 2020-10-23 /pmc/articles/PMC7645116/ /pubmed/33193131 http://dx.doi.org/10.3389/fmicb.2020.550420 Text en Copyright © 2020 Straub, Blackwell, Langarica-Fuentes, Peltzer, Nahnsen and Kleindienst. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Straub, Daniel
Blackwell, Nia
Langarica-Fuentes, Adrian
Peltzer, Alexander
Nahnsen, Sven
Kleindienst, Sara
Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline
title Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline
title_full Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline
title_fullStr Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline
title_full_unstemmed Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline
title_short Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline
title_sort interpretations of environmental microbial community studies are biased by the selected 16s rrna (gene) amplicon sequencing pipeline
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7645116/
https://www.ncbi.nlm.nih.gov/pubmed/33193131
http://dx.doi.org/10.3389/fmicb.2020.550420
work_keys_str_mv AT straubdaniel interpretationsofenvironmentalmicrobialcommunitystudiesarebiasedbytheselected16srrnageneampliconsequencingpipeline
AT blackwellnia interpretationsofenvironmentalmicrobialcommunitystudiesarebiasedbytheselected16srrnageneampliconsequencingpipeline
AT langaricafuentesadrian interpretationsofenvironmentalmicrobialcommunitystudiesarebiasedbytheselected16srrnageneampliconsequencingpipeline
AT peltzeralexander interpretationsofenvironmentalmicrobialcommunitystudiesarebiasedbytheselected16srrnageneampliconsequencingpipeline
AT nahnsensven interpretationsofenvironmentalmicrobialcommunitystudiesarebiasedbytheselected16srrnageneampliconsequencingpipeline
AT kleindienstsara interpretationsofenvironmentalmicrobialcommunitystudiesarebiasedbytheselected16srrnageneampliconsequencingpipeline