Cargando…
A statistical model for describing and simulating microbial community profiles
Many methods have been developed for statistical analysis of microbial community profiles, but due to the complex nature of typical microbiome measurements (e.g. sparsity, zero-inflation, non-independence, and compositionality) and of the associated underlying biology, it is difficult to compare or...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8491899/ https://www.ncbi.nlm.nih.gov/pubmed/34516542 http://dx.doi.org/10.1371/journal.pcbi.1008913 |
_version_ | 1784578822734086144 |
---|---|
author | Ma, Siyuan Ren, Boyu Mallick, Himel Moon, Yo Sup Schwager, Emma Maharjan, Sagun Tickle, Timothy L. Lu, Yiren Carmody, Rachel N. Franzosa, Eric A. Janson, Lucas Huttenhower, Curtis |
author_facet | Ma, Siyuan Ren, Boyu Mallick, Himel Moon, Yo Sup Schwager, Emma Maharjan, Sagun Tickle, Timothy L. Lu, Yiren Carmody, Rachel N. Franzosa, Eric A. Janson, Lucas Huttenhower, Curtis |
author_sort | Ma, Siyuan |
collection | PubMed |
description | Many methods have been developed for statistical analysis of microbial community profiles, but due to the complex nature of typical microbiome measurements (e.g. sparsity, zero-inflation, non-independence, and compositionality) and of the associated underlying biology, it is difficult to compare or evaluate such methods within a single systematic framework. To address this challenge, we developed SparseDOSSA (Sparse Data Observations for the Simulation of Synthetic Abundances): a statistical model of microbial ecological population structure, which can be used to parameterize real-world microbial community profiles and to simulate new, realistic profiles of known structure for methods evaluation. Specifically, SparseDOSSA’s model captures marginal microbial feature abundances as a zero-inflated log-normal distribution, with additional model components for absolute cell counts and the sequence read generation process, microbe-microbe, and microbe-environment interactions. Together, these allow fully known covariance structure between synthetic features (i.e. “taxa”) or between features and “phenotypes” to be simulated for method benchmarking. Here, we demonstrate SparseDOSSA’s performance for 1) accurately modeling human-associated microbial population profiles; 2) generating synthetic communities with controlled population and ecological structures; 3) spiking-in true positive synthetic associations to benchmark analysis methods; and 4) recapitulating an end-to-end mouse microbiome feeding experiment. Together, these represent the most common analysis types in assessment of real microbial community environmental and epidemiological statistics, thus demonstrating SparseDOSSA’s utility as a general-purpose aid for modeling communities and evaluating quantitative methods. An open-source implementation is available at http://huttenhower.sph.harvard.edu/sparsedossa2. |
format | Online Article Text |
id | pubmed-8491899 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-84918992021-10-06 A statistical model for describing and simulating microbial community profiles Ma, Siyuan Ren, Boyu Mallick, Himel Moon, Yo Sup Schwager, Emma Maharjan, Sagun Tickle, Timothy L. Lu, Yiren Carmody, Rachel N. Franzosa, Eric A. Janson, Lucas Huttenhower, Curtis PLoS Comput Biol Research Article Many methods have been developed for statistical analysis of microbial community profiles, but due to the complex nature of typical microbiome measurements (e.g. sparsity, zero-inflation, non-independence, and compositionality) and of the associated underlying biology, it is difficult to compare or evaluate such methods within a single systematic framework. To address this challenge, we developed SparseDOSSA (Sparse Data Observations for the Simulation of Synthetic Abundances): a statistical model of microbial ecological population structure, which can be used to parameterize real-world microbial community profiles and to simulate new, realistic profiles of known structure for methods evaluation. Specifically, SparseDOSSA’s model captures marginal microbial feature abundances as a zero-inflated log-normal distribution, with additional model components for absolute cell counts and the sequence read generation process, microbe-microbe, and microbe-environment interactions. Together, these allow fully known covariance structure between synthetic features (i.e. “taxa”) or between features and “phenotypes” to be simulated for method benchmarking. Here, we demonstrate SparseDOSSA’s performance for 1) accurately modeling human-associated microbial population profiles; 2) generating synthetic communities with controlled population and ecological structures; 3) spiking-in true positive synthetic associations to benchmark analysis methods; and 4) recapitulating an end-to-end mouse microbiome feeding experiment. Together, these represent the most common analysis types in assessment of real microbial community environmental and epidemiological statistics, thus demonstrating SparseDOSSA’s utility as a general-purpose aid for modeling communities and evaluating quantitative methods. An open-source implementation is available at http://huttenhower.sph.harvard.edu/sparsedossa2. Public Library of Science 2021-09-13 /pmc/articles/PMC8491899/ /pubmed/34516542 http://dx.doi.org/10.1371/journal.pcbi.1008913 Text en © 2021 Ma et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Ma, Siyuan Ren, Boyu Mallick, Himel Moon, Yo Sup Schwager, Emma Maharjan, Sagun Tickle, Timothy L. Lu, Yiren Carmody, Rachel N. Franzosa, Eric A. Janson, Lucas Huttenhower, Curtis A statistical model for describing and simulating microbial community profiles |
title | A statistical model for describing and simulating microbial community profiles |
title_full | A statistical model for describing and simulating microbial community profiles |
title_fullStr | A statistical model for describing and simulating microbial community profiles |
title_full_unstemmed | A statistical model for describing and simulating microbial community profiles |
title_short | A statistical model for describing and simulating microbial community profiles |
title_sort | statistical model for describing and simulating microbial community profiles |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8491899/ https://www.ncbi.nlm.nih.gov/pubmed/34516542 http://dx.doi.org/10.1371/journal.pcbi.1008913 |
work_keys_str_mv | AT masiyuan astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT renboyu astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT mallickhimel astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT moonyosup astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT schwageremma astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT maharjansagun astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT tickletimothyl astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT luyiren astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT carmodyracheln astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT franzosaerica astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT jansonlucas astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT huttenhowercurtis astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT masiyuan statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT renboyu statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT mallickhimel statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT moonyosup statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT schwageremma statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT maharjansagun statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT tickletimothyl statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT luyiren statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT carmodyracheln statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT franzosaerica statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT jansonlucas statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles AT huttenhowercurtis statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles |