Cargando…

A statistical model for describing and simulating microbial community profiles

Many methods have been developed for statistical analysis of microbial community profiles, but due to the complex nature of typical microbiome measurements (e.g. sparsity, zero-inflation, non-independence, and compositionality) and of the associated underlying biology, it is difficult to compare or...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Siyuan, Ren, Boyu, Mallick, Himel, Moon, Yo Sup, Schwager, Emma, Maharjan, Sagun, Tickle, Timothy L., Lu, Yiren, Carmody, Rachel N., Franzosa, Eric A., Janson, Lucas, Huttenhower, Curtis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8491899/
https://www.ncbi.nlm.nih.gov/pubmed/34516542
http://dx.doi.org/10.1371/journal.pcbi.1008913
_version_ 1784578822734086144
author Ma, Siyuan
Ren, Boyu
Mallick, Himel
Moon, Yo Sup
Schwager, Emma
Maharjan, Sagun
Tickle, Timothy L.
Lu, Yiren
Carmody, Rachel N.
Franzosa, Eric A.
Janson, Lucas
Huttenhower, Curtis
author_facet Ma, Siyuan
Ren, Boyu
Mallick, Himel
Moon, Yo Sup
Schwager, Emma
Maharjan, Sagun
Tickle, Timothy L.
Lu, Yiren
Carmody, Rachel N.
Franzosa, Eric A.
Janson, Lucas
Huttenhower, Curtis
author_sort Ma, Siyuan
collection PubMed
description Many methods have been developed for statistical analysis of microbial community profiles, but due to the complex nature of typical microbiome measurements (e.g. sparsity, zero-inflation, non-independence, and compositionality) and of the associated underlying biology, it is difficult to compare or evaluate such methods within a single systematic framework. To address this challenge, we developed SparseDOSSA (Sparse Data Observations for the Simulation of Synthetic Abundances): a statistical model of microbial ecological population structure, which can be used to parameterize real-world microbial community profiles and to simulate new, realistic profiles of known structure for methods evaluation. Specifically, SparseDOSSA’s model captures marginal microbial feature abundances as a zero-inflated log-normal distribution, with additional model components for absolute cell counts and the sequence read generation process, microbe-microbe, and microbe-environment interactions. Together, these allow fully known covariance structure between synthetic features (i.e. “taxa”) or between features and “phenotypes” to be simulated for method benchmarking. Here, we demonstrate SparseDOSSA’s performance for 1) accurately modeling human-associated microbial population profiles; 2) generating synthetic communities with controlled population and ecological structures; 3) spiking-in true positive synthetic associations to benchmark analysis methods; and 4) recapitulating an end-to-end mouse microbiome feeding experiment. Together, these represent the most common analysis types in assessment of real microbial community environmental and epidemiological statistics, thus demonstrating SparseDOSSA’s utility as a general-purpose aid for modeling communities and evaluating quantitative methods. An open-source implementation is available at http://huttenhower.sph.harvard.edu/sparsedossa2.
format Online
Article
Text
id pubmed-8491899
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-84918992021-10-06 A statistical model for describing and simulating microbial community profiles Ma, Siyuan Ren, Boyu Mallick, Himel Moon, Yo Sup Schwager, Emma Maharjan, Sagun Tickle, Timothy L. Lu, Yiren Carmody, Rachel N. Franzosa, Eric A. Janson, Lucas Huttenhower, Curtis PLoS Comput Biol Research Article Many methods have been developed for statistical analysis of microbial community profiles, but due to the complex nature of typical microbiome measurements (e.g. sparsity, zero-inflation, non-independence, and compositionality) and of the associated underlying biology, it is difficult to compare or evaluate such methods within a single systematic framework. To address this challenge, we developed SparseDOSSA (Sparse Data Observations for the Simulation of Synthetic Abundances): a statistical model of microbial ecological population structure, which can be used to parameterize real-world microbial community profiles and to simulate new, realistic profiles of known structure for methods evaluation. Specifically, SparseDOSSA’s model captures marginal microbial feature abundances as a zero-inflated log-normal distribution, with additional model components for absolute cell counts and the sequence read generation process, microbe-microbe, and microbe-environment interactions. Together, these allow fully known covariance structure between synthetic features (i.e. “taxa”) or between features and “phenotypes” to be simulated for method benchmarking. Here, we demonstrate SparseDOSSA’s performance for 1) accurately modeling human-associated microbial population profiles; 2) generating synthetic communities with controlled population and ecological structures; 3) spiking-in true positive synthetic associations to benchmark analysis methods; and 4) recapitulating an end-to-end mouse microbiome feeding experiment. Together, these represent the most common analysis types in assessment of real microbial community environmental and epidemiological statistics, thus demonstrating SparseDOSSA’s utility as a general-purpose aid for modeling communities and evaluating quantitative methods. An open-source implementation is available at http://huttenhower.sph.harvard.edu/sparsedossa2. Public Library of Science 2021-09-13 /pmc/articles/PMC8491899/ /pubmed/34516542 http://dx.doi.org/10.1371/journal.pcbi.1008913 Text en © 2021 Ma et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Ma, Siyuan
Ren, Boyu
Mallick, Himel
Moon, Yo Sup
Schwager, Emma
Maharjan, Sagun
Tickle, Timothy L.
Lu, Yiren
Carmody, Rachel N.
Franzosa, Eric A.
Janson, Lucas
Huttenhower, Curtis
A statistical model for describing and simulating microbial community profiles
title A statistical model for describing and simulating microbial community profiles
title_full A statistical model for describing and simulating microbial community profiles
title_fullStr A statistical model for describing and simulating microbial community profiles
title_full_unstemmed A statistical model for describing and simulating microbial community profiles
title_short A statistical model for describing and simulating microbial community profiles
title_sort statistical model for describing and simulating microbial community profiles
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8491899/
https://www.ncbi.nlm.nih.gov/pubmed/34516542
http://dx.doi.org/10.1371/journal.pcbi.1008913
work_keys_str_mv AT masiyuan astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT renboyu astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT mallickhimel astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT moonyosup astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT schwageremma astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT maharjansagun astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT tickletimothyl astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT luyiren astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT carmodyracheln astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT franzosaerica astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT jansonlucas astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT huttenhowercurtis astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT masiyuan statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT renboyu statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT mallickhimel statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT moonyosup statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT schwageremma statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT maharjansagun statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT tickletimothyl statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT luyiren statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT carmodyracheln statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT franzosaerica statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT jansonlucas statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT huttenhowercurtis statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles