Cargando…
Clustering microbiome data using mixtures of logistic normal multinomial models
Discrete data such as counts of microbiome taxa resulting from next-generation sequencing are routinely encountered in bioinformatics. Taxa count data in microbiome studies are typically high-dimensional, over-dispersed, and can only reveal relative abundance therefore being treated as compositional...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10484970/ https://www.ncbi.nlm.nih.gov/pubmed/37679485 http://dx.doi.org/10.1038/s41598-023-41318-8 |
_version_ | 1785102690837069824 |
---|---|
author | Fang, Yuan Subedi, Sanjeena |
author_facet | Fang, Yuan Subedi, Sanjeena |
author_sort | Fang, Yuan |
collection | PubMed |
description | Discrete data such as counts of microbiome taxa resulting from next-generation sequencing are routinely encountered in bioinformatics. Taxa count data in microbiome studies are typically high-dimensional, over-dispersed, and can only reveal relative abundance therefore being treated as compositional. Analyzing compositional data presents many challenges because they are restricted to a simplex. In a logistic normal multinomial model, the relative abundance is mapped from a simplex to a latent variable that exists on the real Euclidean space using the additive log-ratio transformation. While a logistic normal multinomial approach brings flexibility for modeling the data, it comes with a heavy computational cost as the parameter estimation typically relies on Bayesian techniques. In this paper, we develop a novel mixture of logistic normal multinomial models for clustering microbiome data. Additionally, we utilize an efficient framework for parameter estimation using variational Gaussian approximations (VGA). Adopting a variational Gaussian approximation for the posterior of the latent variable reduces the computational overhead substantially. The proposed method is illustrated on simulated and real datasets. |
format | Online Article Text |
id | pubmed-10484970 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-104849702023-09-09 Clustering microbiome data using mixtures of logistic normal multinomial models Fang, Yuan Subedi, Sanjeena Sci Rep Article Discrete data such as counts of microbiome taxa resulting from next-generation sequencing are routinely encountered in bioinformatics. Taxa count data in microbiome studies are typically high-dimensional, over-dispersed, and can only reveal relative abundance therefore being treated as compositional. Analyzing compositional data presents many challenges because they are restricted to a simplex. In a logistic normal multinomial model, the relative abundance is mapped from a simplex to a latent variable that exists on the real Euclidean space using the additive log-ratio transformation. While a logistic normal multinomial approach brings flexibility for modeling the data, it comes with a heavy computational cost as the parameter estimation typically relies on Bayesian techniques. In this paper, we develop a novel mixture of logistic normal multinomial models for clustering microbiome data. Additionally, we utilize an efficient framework for parameter estimation using variational Gaussian approximations (VGA). Adopting a variational Gaussian approximation for the posterior of the latent variable reduces the computational overhead substantially. The proposed method is illustrated on simulated and real datasets. Nature Publishing Group UK 2023-09-07 /pmc/articles/PMC10484970/ /pubmed/37679485 http://dx.doi.org/10.1038/s41598-023-41318-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Fang, Yuan Subedi, Sanjeena Clustering microbiome data using mixtures of logistic normal multinomial models |
title | Clustering microbiome data using mixtures of logistic normal multinomial models |
title_full | Clustering microbiome data using mixtures of logistic normal multinomial models |
title_fullStr | Clustering microbiome data using mixtures of logistic normal multinomial models |
title_full_unstemmed | Clustering microbiome data using mixtures of logistic normal multinomial models |
title_short | Clustering microbiome data using mixtures of logistic normal multinomial models |
title_sort | clustering microbiome data using mixtures of logistic normal multinomial models |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10484970/ https://www.ncbi.nlm.nih.gov/pubmed/37679485 http://dx.doi.org/10.1038/s41598-023-41318-8 |
work_keys_str_mv | AT fangyuan clusteringmicrobiomedatausingmixturesoflogisticnormalmultinomialmodels AT subedisanjeena clusteringmicrobiomedatausingmixturesoflogisticnormalmultinomialmodels |