Cargando…

A Zero-Inflated Latent Dirichlet Allocation Model for Microbiome Studies

The human microbiome consists of a community of microbes in varying abundances and is shown to be associated with many diseases. An important first step in many microbiome studies is to identify possible distinct microbial communities in a given data set and to identify the important bacterial taxa...

Descripción completa

Detalles Bibliográficos
Autores principales: Deek, Rebecca A., Li, Hongzhe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7862749/
https://www.ncbi.nlm.nih.gov/pubmed/33552122
http://dx.doi.org/10.3389/fgene.2020.602594
_version_ 1783647356825108480
author Deek, Rebecca A.
Li, Hongzhe
author_facet Deek, Rebecca A.
Li, Hongzhe
author_sort Deek, Rebecca A.
collection PubMed
description The human microbiome consists of a community of microbes in varying abundances and is shown to be associated with many diseases. An important first step in many microbiome studies is to identify possible distinct microbial communities in a given data set and to identify the important bacterial taxa that characterize these communities. The data from typical microbiome studies are high dimensional count data with excessive zeros due to both absence of species (structural zeros) and low sequencing depth or dropout. Although methods have been developed for identifying the microbial communities based on mixture models of counts, these methods do not account for excessive zeros observed in the data and do not differentiate structural from sampling zeros. In this paper, we introduce a zero-inflated Latent Dirichlet Allocation model (zinLDA) for sparse count data observed in microbiome studies. zinLDA builds on the flexible Latent Dirichlet Allocation model and allows for zero inflation in observed counts. We develop an efficient Markov chain Monte Carlo (MCMC) sampling procedure to fit the model. Results from our simulations show zinLDA provides better fits to the data and is able to separate structural zeros from sampling zeros. We apply zinLDA to the data set from the American Gut Project and identify microbial communities characterized by different bacterial genera.
format Online
Article
Text
id pubmed-7862749
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-78627492021-02-06 A Zero-Inflated Latent Dirichlet Allocation Model for Microbiome Studies Deek, Rebecca A. Li, Hongzhe Front Genet Genetics The human microbiome consists of a community of microbes in varying abundances and is shown to be associated with many diseases. An important first step in many microbiome studies is to identify possible distinct microbial communities in a given data set and to identify the important bacterial taxa that characterize these communities. The data from typical microbiome studies are high dimensional count data with excessive zeros due to both absence of species (structural zeros) and low sequencing depth or dropout. Although methods have been developed for identifying the microbial communities based on mixture models of counts, these methods do not account for excessive zeros observed in the data and do not differentiate structural from sampling zeros. In this paper, we introduce a zero-inflated Latent Dirichlet Allocation model (zinLDA) for sparse count data observed in microbiome studies. zinLDA builds on the flexible Latent Dirichlet Allocation model and allows for zero inflation in observed counts. We develop an efficient Markov chain Monte Carlo (MCMC) sampling procedure to fit the model. Results from our simulations show zinLDA provides better fits to the data and is able to separate structural zeros from sampling zeros. We apply zinLDA to the data set from the American Gut Project and identify microbial communities characterized by different bacterial genera. Frontiers Media S.A. 2021-01-22 /pmc/articles/PMC7862749/ /pubmed/33552122 http://dx.doi.org/10.3389/fgene.2020.602594 Text en Copyright © 2021 Deek and Li. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Deek, Rebecca A.
Li, Hongzhe
A Zero-Inflated Latent Dirichlet Allocation Model for Microbiome Studies
title A Zero-Inflated Latent Dirichlet Allocation Model for Microbiome Studies
title_full A Zero-Inflated Latent Dirichlet Allocation Model for Microbiome Studies
title_fullStr A Zero-Inflated Latent Dirichlet Allocation Model for Microbiome Studies
title_full_unstemmed A Zero-Inflated Latent Dirichlet Allocation Model for Microbiome Studies
title_short A Zero-Inflated Latent Dirichlet Allocation Model for Microbiome Studies
title_sort zero-inflated latent dirichlet allocation model for microbiome studies
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7862749/
https://www.ncbi.nlm.nih.gov/pubmed/33552122
http://dx.doi.org/10.3389/fgene.2020.602594
work_keys_str_mv AT deekrebeccaa azeroinflatedlatentdirichletallocationmodelformicrobiomestudies
AT lihongzhe azeroinflatedlatentdirichletallocationmodelformicrobiomestudies
AT deekrebeccaa zeroinflatedlatentdirichletallocationmodelformicrobiomestudies
AT lihongzhe zeroinflatedlatentdirichletallocationmodelformicrobiomestudies