Cargando…

Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses

High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or OTU composition of communities in different conditions. The s...

Descripción completa

Detalles Bibliográficos
Autores principales: Callahan, Ben J., Sankaran, Kris, Fukuyama, Julia A., McMurdie, Paul J., Holmes, Susan P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000Research 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4955027/
https://www.ncbi.nlm.nih.gov/pubmed/27508062
http://dx.doi.org/10.12688/f1000research.8986.2
_version_ 1782443876954406912
author Callahan, Ben J.
Sankaran, Kris
Fukuyama, Julia A.
McMurdie, Paul J.
Holmes, Susan P.
author_facet Callahan, Ben J.
Sankaran, Kris
Fukuyama, Julia A.
McMurdie, Paul J.
Holmes, Susan P.
author_sort Callahan, Ben J.
collection PubMed
description High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or OTU composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, whether parametric or nonparametric. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests and nonparametric testing using community networks and the ggnetwork package.
format Online
Article
Text
id pubmed-4955027
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher F1000Research
record_format MEDLINE/PubMed
spelling pubmed-49550272016-08-08 Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses Callahan, Ben J. Sankaran, Kris Fukuyama, Julia A. McMurdie, Paul J. Holmes, Susan P. F1000Res Research Article High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or OTU composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, whether parametric or nonparametric. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests and nonparametric testing using community networks and the ggnetwork package. F1000Research 2016-11-02 /pmc/articles/PMC4955027/ /pubmed/27508062 http://dx.doi.org/10.12688/f1000research.8986.2 Text en Copyright: © 2016 Callahan BJ et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Callahan, Ben J.
Sankaran, Kris
Fukuyama, Julia A.
McMurdie, Paul J.
Holmes, Susan P.
Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses
title Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses
title_full Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses
title_fullStr Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses
title_full_unstemmed Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses
title_short Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses
title_sort bioconductor workflow for microbiome data analysis: from raw reads to community analyses
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4955027/
https://www.ncbi.nlm.nih.gov/pubmed/27508062
http://dx.doi.org/10.12688/f1000research.8986.2
work_keys_str_mv AT callahanbenj bioconductorworkflowformicrobiomedataanalysisfromrawreadstocommunityanalyses
AT sankarankris bioconductorworkflowformicrobiomedataanalysisfromrawreadstocommunityanalyses
AT fukuyamajuliaa bioconductorworkflowformicrobiomedataanalysisfromrawreadstocommunityanalyses
AT mcmurdiepaulj bioconductorworkflowformicrobiomedataanalysisfromrawreadstocommunityanalyses
AT holmessusanp bioconductorworkflowformicrobiomedataanalysisfromrawreadstocommunityanalyses