Cargando…

A Bayesian method for detecting pairwise associations in compositional data

Compositional data consist of vectors of proportions normalized to a constant sum from a basis of unobserved counts. The sum constraint makes inference on correlations between unconstrained features challenging due to the information loss from normalization. However, such correlations are of long-st...

Descripción completa

Detalles Bibliográficos
Autores principales: Schwager, Emma, Mallick, Himel, Ventz, Steffen, Huttenhower, Curtis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5706738/
https://www.ncbi.nlm.nih.gov/pubmed/29140991
http://dx.doi.org/10.1371/journal.pcbi.1005852
_version_ 1783282280993652736
author Schwager, Emma
Mallick, Himel
Ventz, Steffen
Huttenhower, Curtis
author_facet Schwager, Emma
Mallick, Himel
Ventz, Steffen
Huttenhower, Curtis
author_sort Schwager, Emma
collection PubMed
description Compositional data consist of vectors of proportions normalized to a constant sum from a basis of unobserved counts. The sum constraint makes inference on correlations between unconstrained features challenging due to the information loss from normalization. However, such correlations are of long-standing interest in fields including ecology. We propose a novel Bayesian framework (BAnOCC: Bayesian Analysis of Compositional Covariance) to estimate a sparse precision matrix through a LASSO prior. The resulting posterior, generated by MCMC sampling, allows uncertainty quantification of any function of the precision matrix, including the correlation matrix. We also use a first-order Taylor expansion to approximate the transformation from the unobserved counts to the composition in order to investigate what characteristics of the unobserved counts can make the correlations more or less difficult to infer. On simulated datasets, we show that BAnOCC infers the true network as well as previous methods while offering the advantage of posterior inference. Larger and more realistic simulated datasets further showed that BAnOCC performs well as measured by type I and type II error rates. Finally, we apply BAnOCC to a microbial ecology dataset from the Human Microbiome Project, which in addition to reproducing established ecological results revealed unique, competition-based roles for Proteobacteria in multiple distinct habitats.
format Online
Article
Text
id pubmed-5706738
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-57067382017-12-08 A Bayesian method for detecting pairwise associations in compositional data Schwager, Emma Mallick, Himel Ventz, Steffen Huttenhower, Curtis PLoS Comput Biol Research Article Compositional data consist of vectors of proportions normalized to a constant sum from a basis of unobserved counts. The sum constraint makes inference on correlations between unconstrained features challenging due to the information loss from normalization. However, such correlations are of long-standing interest in fields including ecology. We propose a novel Bayesian framework (BAnOCC: Bayesian Analysis of Compositional Covariance) to estimate a sparse precision matrix through a LASSO prior. The resulting posterior, generated by MCMC sampling, allows uncertainty quantification of any function of the precision matrix, including the correlation matrix. We also use a first-order Taylor expansion to approximate the transformation from the unobserved counts to the composition in order to investigate what characteristics of the unobserved counts can make the correlations more or less difficult to infer. On simulated datasets, we show that BAnOCC infers the true network as well as previous methods while offering the advantage of posterior inference. Larger and more realistic simulated datasets further showed that BAnOCC performs well as measured by type I and type II error rates. Finally, we apply BAnOCC to a microbial ecology dataset from the Human Microbiome Project, which in addition to reproducing established ecological results revealed unique, competition-based roles for Proteobacteria in multiple distinct habitats. Public Library of Science 2017-11-15 /pmc/articles/PMC5706738/ /pubmed/29140991 http://dx.doi.org/10.1371/journal.pcbi.1005852 Text en © 2017 Schwager et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Schwager, Emma
Mallick, Himel
Ventz, Steffen
Huttenhower, Curtis
A Bayesian method for detecting pairwise associations in compositional data
title A Bayesian method for detecting pairwise associations in compositional data
title_full A Bayesian method for detecting pairwise associations in compositional data
title_fullStr A Bayesian method for detecting pairwise associations in compositional data
title_full_unstemmed A Bayesian method for detecting pairwise associations in compositional data
title_short A Bayesian method for detecting pairwise associations in compositional data
title_sort bayesian method for detecting pairwise associations in compositional data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5706738/
https://www.ncbi.nlm.nih.gov/pubmed/29140991
http://dx.doi.org/10.1371/journal.pcbi.1005852
work_keys_str_mv AT schwageremma abayesianmethodfordetectingpairwiseassociationsincompositionaldata
AT mallickhimel abayesianmethodfordetectingpairwiseassociationsincompositionaldata
AT ventzsteffen abayesianmethodfordetectingpairwiseassociationsincompositionaldata
AT huttenhowercurtis abayesianmethodfordetectingpairwiseassociationsincompositionaldata
AT schwageremma bayesianmethodfordetectingpairwiseassociationsincompositionaldata
AT mallickhimel bayesianmethodfordetectingpairwiseassociationsincompositionaldata
AT ventzsteffen bayesianmethodfordetectingpairwiseassociationsincompositionaldata
AT huttenhowercurtis bayesianmethodfordetectingpairwiseassociationsincompositionaldata