Cargando…

A Bayesian method for identifying associations between response variables and bacterial community composition

Determining associations between intestinal bacteria and continuously measured physiological outcomes is important for understanding the bacteria-host relationship but is not straightforward since abundance data (compositional data) are not normally distributed. To address this issue, we developed a...

Descripción completa

Detalles Bibliográficos
Autores principales: Verster, Adrian, Petronella, Nicholas, Green, Judy, Matias, Fernando, Brooks, Stephen P. J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9307184/
https://www.ncbi.nlm.nih.gov/pubmed/35793382
http://dx.doi.org/10.1371/journal.pcbi.1010108
_version_ 1784752703952388096
author Verster, Adrian
Petronella, Nicholas
Green, Judy
Matias, Fernando
Brooks, Stephen P. J.
author_facet Verster, Adrian
Petronella, Nicholas
Green, Judy
Matias, Fernando
Brooks, Stephen P. J.
author_sort Verster, Adrian
collection PubMed
description Determining associations between intestinal bacteria and continuously measured physiological outcomes is important for understanding the bacteria-host relationship but is not straightforward since abundance data (compositional data) are not normally distributed. To address this issue, we developed a fully Bayesian linear regression model (BRACoD; Bayesian Regression Analysis of Compositional Data) with physiological measurements (continuous data) as a function of a matrix of relative bacterial abundances. Bacteria can be classified as operational taxonomic units or by taxonomy (genus, family, etc.). Bacteria associated with the physiological measurement were identified using a Bayesian variable selection method: Stochastic Search Variable Selection. The output is a list of inclusion probabilities ([Image: see text] ) and coefficients that indicate the strength of the association ([Image: see text] ) for each bacterial taxa. Tests with simulated communities showed that adopting a cut point value of [Image: see text] ≥ 0.3 for identifying included bacteria optimized the true positive rate (TPR) while maintaining a false positive rate (FPR) of ≤ 5%. At this point, the chances of identifying non-contributing bacteria were low and all well-established contributors were included. Comparison with other methods showed that BRACoD (at [Image: see text] ≥ 0.3) had higher precision and a higher TPR than a commonly used center log transformed LASSO procedure (clr-LASSO) as well as higher TPR than an off-the-shelf Spike and Slab method after center log transformation (clr-SS). BRACoD was also less likely to include non-contributing bacteria that merely correlate with contributing bacteria. Analysis of a rat microbiome experiment identified 47 operational taxonomic units that contributed to fecal butyrate levels. Of these, 31 were positively and 16 negatively associated with butyrate. Consistent with their known role in butyrate metabolism, most of these fell within the Lachnospiraceae and Ruminococcaceae. We conclude that BRACoD provides a more precise and accurate method for determining bacteria associated with a continuous physiological outcome compared to clr-LASSO. It is more sensitive than a generalized clr-SS algorithm, although it has a higher FPR. Its ability to distinguish genuine contributors from correlated bacteria makes it better suited to discriminating bacteria that directly contribute to an outcome. The algorithm corrects for the distortions arising from compositional data making it appropriate for analysis of microbiome data.
format Online
Article
Text
id pubmed-9307184
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-93071842022-07-23 A Bayesian method for identifying associations between response variables and bacterial community composition Verster, Adrian Petronella, Nicholas Green, Judy Matias, Fernando Brooks, Stephen P. J. PLoS Comput Biol Research Article Determining associations between intestinal bacteria and continuously measured physiological outcomes is important for understanding the bacteria-host relationship but is not straightforward since abundance data (compositional data) are not normally distributed. To address this issue, we developed a fully Bayesian linear regression model (BRACoD; Bayesian Regression Analysis of Compositional Data) with physiological measurements (continuous data) as a function of a matrix of relative bacterial abundances. Bacteria can be classified as operational taxonomic units or by taxonomy (genus, family, etc.). Bacteria associated with the physiological measurement were identified using a Bayesian variable selection method: Stochastic Search Variable Selection. The output is a list of inclusion probabilities ([Image: see text] ) and coefficients that indicate the strength of the association ([Image: see text] ) for each bacterial taxa. Tests with simulated communities showed that adopting a cut point value of [Image: see text] ≥ 0.3 for identifying included bacteria optimized the true positive rate (TPR) while maintaining a false positive rate (FPR) of ≤ 5%. At this point, the chances of identifying non-contributing bacteria were low and all well-established contributors were included. Comparison with other methods showed that BRACoD (at [Image: see text] ≥ 0.3) had higher precision and a higher TPR than a commonly used center log transformed LASSO procedure (clr-LASSO) as well as higher TPR than an off-the-shelf Spike and Slab method after center log transformation (clr-SS). BRACoD was also less likely to include non-contributing bacteria that merely correlate with contributing bacteria. Analysis of a rat microbiome experiment identified 47 operational taxonomic units that contributed to fecal butyrate levels. Of these, 31 were positively and 16 negatively associated with butyrate. Consistent with their known role in butyrate metabolism, most of these fell within the Lachnospiraceae and Ruminococcaceae. We conclude that BRACoD provides a more precise and accurate method for determining bacteria associated with a continuous physiological outcome compared to clr-LASSO. It is more sensitive than a generalized clr-SS algorithm, although it has a higher FPR. Its ability to distinguish genuine contributors from correlated bacteria makes it better suited to discriminating bacteria that directly contribute to an outcome. The algorithm corrects for the distortions arising from compositional data making it appropriate for analysis of microbiome data. Public Library of Science 2022-07-06 /pmc/articles/PMC9307184/ /pubmed/35793382 http://dx.doi.org/10.1371/journal.pcbi.1010108 Text en © 2022 Verster et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Verster, Adrian
Petronella, Nicholas
Green, Judy
Matias, Fernando
Brooks, Stephen P. J.
A Bayesian method for identifying associations between response variables and bacterial community composition
title A Bayesian method for identifying associations between response variables and bacterial community composition
title_full A Bayesian method for identifying associations between response variables and bacterial community composition
title_fullStr A Bayesian method for identifying associations between response variables and bacterial community composition
title_full_unstemmed A Bayesian method for identifying associations between response variables and bacterial community composition
title_short A Bayesian method for identifying associations between response variables and bacterial community composition
title_sort bayesian method for identifying associations between response variables and bacterial community composition
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9307184/
https://www.ncbi.nlm.nih.gov/pubmed/35793382
http://dx.doi.org/10.1371/journal.pcbi.1010108
work_keys_str_mv AT versteradrian abayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition
AT petronellanicholas abayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition
AT greenjudy abayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition
AT matiasfernando abayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition
AT brooksstephenpj abayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition
AT versteradrian bayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition
AT petronellanicholas bayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition
AT greenjudy bayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition
AT matiasfernando bayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition
AT brooksstephenpj bayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition