Cargando…

An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data

BACKGROUND: The Human Microbiome has been variously associated with the immune-regulatory mechanisms involved in the prevention or development of many non-infectious human diseases such as autoimmunity, allergy and cancer. Integrative approaches which aim at associating the composition of the human...

Descripción completa

Detalles Bibliográficos
Autores principales: Wadsworth, W. Duncan, Argiento, Raffaele, Guindani, Michele, Galloway-Pena, Jessica, Shelbourne, Samuel A., Vannucci, Marina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5299727/
https://www.ncbi.nlm.nih.gov/pubmed/28178947
http://dx.doi.org/10.1186/s12859-017-1516-0
_version_ 1782506080386940928
author Wadsworth, W. Duncan
Argiento, Raffaele
Guindani, Michele
Galloway-Pena, Jessica
Shelbourne, Samuel A.
Vannucci, Marina
author_facet Wadsworth, W. Duncan
Argiento, Raffaele
Guindani, Michele
Galloway-Pena, Jessica
Shelbourne, Samuel A.
Vannucci, Marina
author_sort Wadsworth, W. Duncan
collection PubMed
description BACKGROUND: The Human Microbiome has been variously associated with the immune-regulatory mechanisms involved in the prevention or development of many non-infectious human diseases such as autoimmunity, allergy and cancer. Integrative approaches which aim at associating the composition of the human microbiome with other available information, such as clinical covariates and environmental predictors, are paramount to develop a more complete understanding of the role of microbiome in disease development. RESULTS: In this manuscript, we propose a Bayesian Dirichlet-Multinomial regression model which uses spike-and-slab priors for the selection of significant associations between a set of available covariates and taxa from a microbiome abundance table. The approach allows straightforward incorporation of the covariates through a log-linear regression parametrization of the parameters of the Dirichlet-Multinomial likelihood. Inference is conducted through a Markov Chain Monte Carlo algorithm, and selection of the significant covariates is based upon the assessment of posterior probabilities of inclusions and the thresholding of the Bayesian false discovery rate. We design a simulation study to evaluate the performance of the proposed method, and then apply our model on a publicly available dataset obtained from the Human Microbiome Project which associates taxa abundances with KEGG orthology pathways. The method is implemented in specifically developed R code, which has been made publicly available. CONCLUSIONS: Our method compares favorably in simulations to several recently proposed approaches for similarly structured data, in terms of increased accuracy and reduced false positive as well as false negative rates. In the application to the data from the Human Microbiome Project, a close evaluation of the biological significance of our findings confirms existing associations in the literature. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1516-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5299727
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-52997272017-02-13 An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data Wadsworth, W. Duncan Argiento, Raffaele Guindani, Michele Galloway-Pena, Jessica Shelbourne, Samuel A. Vannucci, Marina BMC Bioinformatics Methodology Article BACKGROUND: The Human Microbiome has been variously associated with the immune-regulatory mechanisms involved in the prevention or development of many non-infectious human diseases such as autoimmunity, allergy and cancer. Integrative approaches which aim at associating the composition of the human microbiome with other available information, such as clinical covariates and environmental predictors, are paramount to develop a more complete understanding of the role of microbiome in disease development. RESULTS: In this manuscript, we propose a Bayesian Dirichlet-Multinomial regression model which uses spike-and-slab priors for the selection of significant associations between a set of available covariates and taxa from a microbiome abundance table. The approach allows straightforward incorporation of the covariates through a log-linear regression parametrization of the parameters of the Dirichlet-Multinomial likelihood. Inference is conducted through a Markov Chain Monte Carlo algorithm, and selection of the significant covariates is based upon the assessment of posterior probabilities of inclusions and the thresholding of the Bayesian false discovery rate. We design a simulation study to evaluate the performance of the proposed method, and then apply our model on a publicly available dataset obtained from the Human Microbiome Project which associates taxa abundances with KEGG orthology pathways. The method is implemented in specifically developed R code, which has been made publicly available. CONCLUSIONS: Our method compares favorably in simulations to several recently proposed approaches for similarly structured data, in terms of increased accuracy and reduced false positive as well as false negative rates. In the application to the data from the Human Microbiome Project, a close evaluation of the biological significance of our findings confirms existing associations in the literature. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1516-0) contains supplementary material, which is available to authorized users. BioMed Central 2017-02-08 /pmc/articles/PMC5299727/ /pubmed/28178947 http://dx.doi.org/10.1186/s12859-017-1516-0 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Wadsworth, W. Duncan
Argiento, Raffaele
Guindani, Michele
Galloway-Pena, Jessica
Shelbourne, Samuel A.
Vannucci, Marina
An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data
title An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data
title_full An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data
title_fullStr An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data
title_full_unstemmed An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data
title_short An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data
title_sort integrative bayesian dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5299727/
https://www.ncbi.nlm.nih.gov/pubmed/28178947
http://dx.doi.org/10.1186/s12859-017-1516-0
work_keys_str_mv AT wadsworthwduncan anintegrativebayesiandirichletmultinomialregressionmodelfortheanalysisoftaxonomicabundancesinmicrobiomedata
AT argientoraffaele anintegrativebayesiandirichletmultinomialregressionmodelfortheanalysisoftaxonomicabundancesinmicrobiomedata
AT guindanimichele anintegrativebayesiandirichletmultinomialregressionmodelfortheanalysisoftaxonomicabundancesinmicrobiomedata
AT gallowaypenajessica anintegrativebayesiandirichletmultinomialregressionmodelfortheanalysisoftaxonomicabundancesinmicrobiomedata
AT shelbournesamuela anintegrativebayesiandirichletmultinomialregressionmodelfortheanalysisoftaxonomicabundancesinmicrobiomedata
AT vannuccimarina anintegrativebayesiandirichletmultinomialregressionmodelfortheanalysisoftaxonomicabundancesinmicrobiomedata
AT wadsworthwduncan integrativebayesiandirichletmultinomialregressionmodelfortheanalysisoftaxonomicabundancesinmicrobiomedata
AT argientoraffaele integrativebayesiandirichletmultinomialregressionmodelfortheanalysisoftaxonomicabundancesinmicrobiomedata
AT guindanimichele integrativebayesiandirichletmultinomialregressionmodelfortheanalysisoftaxonomicabundancesinmicrobiomedata
AT gallowaypenajessica integrativebayesiandirichletmultinomialregressionmodelfortheanalysisoftaxonomicabundancesinmicrobiomedata
AT shelbournesamuela integrativebayesiandirichletmultinomialregressionmodelfortheanalysisoftaxonomicabundancesinmicrobiomedata
AT vannuccimarina integrativebayesiandirichletmultinomialregressionmodelfortheanalysisoftaxonomicabundancesinmicrobiomedata