Cargando…

Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics

We introduce Dirichlet multinomial mixtures (DMM) for the probabilistic modelling of microbial metagenomics data. This data can be represented as a frequency matrix giving the number of times each taxa is observed in each sample. The samples have different size, and the matrix is sparse, as communit...

Descripción completa

Detalles Bibliográficos
Autores principales: Holmes, Ian, Harris, Keith, Quince, Christopher
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3272020/
https://www.ncbi.nlm.nih.gov/pubmed/22319561
http://dx.doi.org/10.1371/journal.pone.0030126
_version_ 1782222776022597632
author Holmes, Ian
Harris, Keith
Quince, Christopher
author_facet Holmes, Ian
Harris, Keith
Quince, Christopher
author_sort Holmes, Ian
collection PubMed
description We introduce Dirichlet multinomial mixtures (DMM) for the probabilistic modelling of microbial metagenomics data. This data can be represented as a frequency matrix giving the number of times each taxa is observed in each sample. The samples have different size, and the matrix is sparse, as communities are diverse and skewed to rare taxa. Most methods used previously to classify or cluster samples have ignored these features. We describe each community by a vector of taxa probabilities. These vectors are generated from one of a finite number of Dirichlet mixture components each with different hyperparameters. Observed samples are generated through multinomial sampling. The mixture components cluster communities into distinct ‘metacommunities’, and, hence, determine envirotypes or enterotypes, groups of communities with a similar composition. The model can also deduce the impact of a treatment and be used for classification. We wrote software for the fitting of DMM models using the ‘evidence framework’ (http://code.google.com/p/microbedmm/). This includes the Laplace approximation of the model evidence. We applied the DMM model to human gut microbe genera frequencies from Obese and Lean twins. From the model evidence four clusters fit this data best. Two clusters were dominated by Bacteroides and were homogenous; two had a more variable community composition. We could not find a significant impact of body mass on community structure. However, Obese twins were more likely to derive from the high variance clusters. We propose that obesity is not associated with a distinct microbiota but increases the chance that an individual derives from a disturbed enterotype. This is an example of the ‘Anna Karenina principle (AKP)’ applied to microbial communities: disturbed states having many more configurations than undisturbed. We verify this by showing that in a study of inflammatory bowel disease (IBD) phenotypes, ileal Crohn's disease (ICD) is associated with a more variable community.
format Online
Article
Text
id pubmed-3272020
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-32720202012-02-08 Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics Holmes, Ian Harris, Keith Quince, Christopher PLoS One Research Article We introduce Dirichlet multinomial mixtures (DMM) for the probabilistic modelling of microbial metagenomics data. This data can be represented as a frequency matrix giving the number of times each taxa is observed in each sample. The samples have different size, and the matrix is sparse, as communities are diverse and skewed to rare taxa. Most methods used previously to classify or cluster samples have ignored these features. We describe each community by a vector of taxa probabilities. These vectors are generated from one of a finite number of Dirichlet mixture components each with different hyperparameters. Observed samples are generated through multinomial sampling. The mixture components cluster communities into distinct ‘metacommunities’, and, hence, determine envirotypes or enterotypes, groups of communities with a similar composition. The model can also deduce the impact of a treatment and be used for classification. We wrote software for the fitting of DMM models using the ‘evidence framework’ (http://code.google.com/p/microbedmm/). This includes the Laplace approximation of the model evidence. We applied the DMM model to human gut microbe genera frequencies from Obese and Lean twins. From the model evidence four clusters fit this data best. Two clusters were dominated by Bacteroides and were homogenous; two had a more variable community composition. We could not find a significant impact of body mass on community structure. However, Obese twins were more likely to derive from the high variance clusters. We propose that obesity is not associated with a distinct microbiota but increases the chance that an individual derives from a disturbed enterotype. This is an example of the ‘Anna Karenina principle (AKP)’ applied to microbial communities: disturbed states having many more configurations than undisturbed. We verify this by showing that in a study of inflammatory bowel disease (IBD) phenotypes, ileal Crohn's disease (ICD) is associated with a more variable community. Public Library of Science 2012-02-03 /pmc/articles/PMC3272020/ /pubmed/22319561 http://dx.doi.org/10.1371/journal.pone.0030126 Text en Holmes et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Holmes, Ian
Harris, Keith
Quince, Christopher
Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics
title Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics
title_full Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics
title_fullStr Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics
title_full_unstemmed Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics
title_short Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics
title_sort dirichlet multinomial mixtures: generative models for microbial metagenomics
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3272020/
https://www.ncbi.nlm.nih.gov/pubmed/22319561
http://dx.doi.org/10.1371/journal.pone.0030126
work_keys_str_mv AT holmesian dirichletmultinomialmixturesgenerativemodelsformicrobialmetagenomics
AT harriskeith dirichletmultinomialmixturesgenerativemodelsformicrobialmetagenomics
AT quincechristopher dirichletmultinomialmixturesgenerativemodelsformicrobialmetagenomics