Cargando…

Clustering microbiome data using mixtures of logistic normal multinomial models

Discrete data such as counts of microbiome taxa resulting from next-generation sequencing are routinely encountered in bioinformatics. Taxa count data in microbiome studies are typically high-dimensional, over-dispersed, and can only reveal relative abundance therefore being treated as compositional...

Descripción completa

Detalles Bibliográficos
Autores principales: Fang, Yuan, Subedi, Sanjeena
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10484970/
https://www.ncbi.nlm.nih.gov/pubmed/37679485
http://dx.doi.org/10.1038/s41598-023-41318-8
_version_ 1785102690837069824
author Fang, Yuan
Subedi, Sanjeena
author_facet Fang, Yuan
Subedi, Sanjeena
author_sort Fang, Yuan
collection PubMed
description Discrete data such as counts of microbiome taxa resulting from next-generation sequencing are routinely encountered in bioinformatics. Taxa count data in microbiome studies are typically high-dimensional, over-dispersed, and can only reveal relative abundance therefore being treated as compositional. Analyzing compositional data presents many challenges because they are restricted to a simplex. In a logistic normal multinomial model, the relative abundance is mapped from a simplex to a latent variable that exists on the real Euclidean space using the additive log-ratio transformation. While a logistic normal multinomial approach brings flexibility for modeling the data, it comes with a heavy computational cost as the parameter estimation typically relies on Bayesian techniques. In this paper, we develop a novel mixture of logistic normal multinomial models for clustering microbiome data. Additionally, we utilize an efficient framework for parameter estimation using variational Gaussian approximations (VGA). Adopting a variational Gaussian approximation for the posterior of the latent variable reduces the computational overhead substantially. The proposed method is illustrated on simulated and real datasets.
format Online
Article
Text
id pubmed-10484970
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-104849702023-09-09 Clustering microbiome data using mixtures of logistic normal multinomial models Fang, Yuan Subedi, Sanjeena Sci Rep Article Discrete data such as counts of microbiome taxa resulting from next-generation sequencing are routinely encountered in bioinformatics. Taxa count data in microbiome studies are typically high-dimensional, over-dispersed, and can only reveal relative abundance therefore being treated as compositional. Analyzing compositional data presents many challenges because they are restricted to a simplex. In a logistic normal multinomial model, the relative abundance is mapped from a simplex to a latent variable that exists on the real Euclidean space using the additive log-ratio transformation. While a logistic normal multinomial approach brings flexibility for modeling the data, it comes with a heavy computational cost as the parameter estimation typically relies on Bayesian techniques. In this paper, we develop a novel mixture of logistic normal multinomial models for clustering microbiome data. Additionally, we utilize an efficient framework for parameter estimation using variational Gaussian approximations (VGA). Adopting a variational Gaussian approximation for the posterior of the latent variable reduces the computational overhead substantially. The proposed method is illustrated on simulated and real datasets. Nature Publishing Group UK 2023-09-07 /pmc/articles/PMC10484970/ /pubmed/37679485 http://dx.doi.org/10.1038/s41598-023-41318-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Fang, Yuan
Subedi, Sanjeena
Clustering microbiome data using mixtures of logistic normal multinomial models
title Clustering microbiome data using mixtures of logistic normal multinomial models
title_full Clustering microbiome data using mixtures of logistic normal multinomial models
title_fullStr Clustering microbiome data using mixtures of logistic normal multinomial models
title_full_unstemmed Clustering microbiome data using mixtures of logistic normal multinomial models
title_short Clustering microbiome data using mixtures of logistic normal multinomial models
title_sort clustering microbiome data using mixtures of logistic normal multinomial models
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10484970/
https://www.ncbi.nlm.nih.gov/pubmed/37679485
http://dx.doi.org/10.1038/s41598-023-41318-8
work_keys_str_mv AT fangyuan clusteringmicrobiomedatausingmixturesoflogisticnormalmultinomialmodels
AT subedisanjeena clusteringmicrobiomedatausingmixturesoflogisticnormalmultinomialmodels