Cargando…
A Zero-Inflated Latent Dirichlet Allocation Model for Microbiome Studies
The human microbiome consists of a community of microbes in varying abundances and is shown to be associated with many diseases. An important first step in many microbiome studies is to identify possible distinct microbial communities in a given data set and to identify the important bacterial taxa...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7862749/ https://www.ncbi.nlm.nih.gov/pubmed/33552122 http://dx.doi.org/10.3389/fgene.2020.602594 |
_version_ | 1783647356825108480 |
---|---|
author | Deek, Rebecca A. Li, Hongzhe |
author_facet | Deek, Rebecca A. Li, Hongzhe |
author_sort | Deek, Rebecca A. |
collection | PubMed |
description | The human microbiome consists of a community of microbes in varying abundances and is shown to be associated with many diseases. An important first step in many microbiome studies is to identify possible distinct microbial communities in a given data set and to identify the important bacterial taxa that characterize these communities. The data from typical microbiome studies are high dimensional count data with excessive zeros due to both absence of species (structural zeros) and low sequencing depth or dropout. Although methods have been developed for identifying the microbial communities based on mixture models of counts, these methods do not account for excessive zeros observed in the data and do not differentiate structural from sampling zeros. In this paper, we introduce a zero-inflated Latent Dirichlet Allocation model (zinLDA) for sparse count data observed in microbiome studies. zinLDA builds on the flexible Latent Dirichlet Allocation model and allows for zero inflation in observed counts. We develop an efficient Markov chain Monte Carlo (MCMC) sampling procedure to fit the model. Results from our simulations show zinLDA provides better fits to the data and is able to separate structural zeros from sampling zeros. We apply zinLDA to the data set from the American Gut Project and identify microbial communities characterized by different bacterial genera. |
format | Online Article Text |
id | pubmed-7862749 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-78627492021-02-06 A Zero-Inflated Latent Dirichlet Allocation Model for Microbiome Studies Deek, Rebecca A. Li, Hongzhe Front Genet Genetics The human microbiome consists of a community of microbes in varying abundances and is shown to be associated with many diseases. An important first step in many microbiome studies is to identify possible distinct microbial communities in a given data set and to identify the important bacterial taxa that characterize these communities. The data from typical microbiome studies are high dimensional count data with excessive zeros due to both absence of species (structural zeros) and low sequencing depth or dropout. Although methods have been developed for identifying the microbial communities based on mixture models of counts, these methods do not account for excessive zeros observed in the data and do not differentiate structural from sampling zeros. In this paper, we introduce a zero-inflated Latent Dirichlet Allocation model (zinLDA) for sparse count data observed in microbiome studies. zinLDA builds on the flexible Latent Dirichlet Allocation model and allows for zero inflation in observed counts. We develop an efficient Markov chain Monte Carlo (MCMC) sampling procedure to fit the model. Results from our simulations show zinLDA provides better fits to the data and is able to separate structural zeros from sampling zeros. We apply zinLDA to the data set from the American Gut Project and identify microbial communities characterized by different bacterial genera. Frontiers Media S.A. 2021-01-22 /pmc/articles/PMC7862749/ /pubmed/33552122 http://dx.doi.org/10.3389/fgene.2020.602594 Text en Copyright © 2021 Deek and Li. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Deek, Rebecca A. Li, Hongzhe A Zero-Inflated Latent Dirichlet Allocation Model for Microbiome Studies |
title | A Zero-Inflated Latent Dirichlet Allocation Model for Microbiome Studies |
title_full | A Zero-Inflated Latent Dirichlet Allocation Model for Microbiome Studies |
title_fullStr | A Zero-Inflated Latent Dirichlet Allocation Model for Microbiome Studies |
title_full_unstemmed | A Zero-Inflated Latent Dirichlet Allocation Model for Microbiome Studies |
title_short | A Zero-Inflated Latent Dirichlet Allocation Model for Microbiome Studies |
title_sort | zero-inflated latent dirichlet allocation model for microbiome studies |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7862749/ https://www.ncbi.nlm.nih.gov/pubmed/33552122 http://dx.doi.org/10.3389/fgene.2020.602594 |
work_keys_str_mv | AT deekrebeccaa azeroinflatedlatentdirichletallocationmodelformicrobiomestudies AT lihongzhe azeroinflatedlatentdirichletallocationmodelformicrobiomestudies AT deekrebeccaa zeroinflatedlatentdirichletallocationmodelformicrobiomestudies AT lihongzhe zeroinflatedlatentdirichletallocationmodelformicrobiomestudies |