Cargando…
Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data
MOTIVATION: Three-way data structures, characterized by three entities, the units, the variables and the occasions, are frequent in biological studies. In RNA sequencing, three-way data structures are obtained when high-throughput transcriptome sequencing data are collected for n genes across p cond...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10159656/ https://www.ncbi.nlm.nih.gov/pubmed/37018147 http://dx.doi.org/10.1093/bioinformatics/btad167 |
_version_ | 1785037147393228800 |
---|---|
author | Silva, Anjali Qin, Xiaoke Rothstein, Steven J McNicholas, Paul D Subedi, Sanjeena |
author_facet | Silva, Anjali Qin, Xiaoke Rothstein, Steven J McNicholas, Paul D Subedi, Sanjeena |
author_sort | Silva, Anjali |
collection | PubMed |
description | MOTIVATION: Three-way data structures, characterized by three entities, the units, the variables and the occasions, are frequent in biological studies. In RNA sequencing, three-way data structures are obtained when high-throughput transcriptome sequencing data are collected for n genes across p conditions at r occasions. Matrix variate distributions offer a natural way to model three-way data and mixtures of matrix variate distributions can be used to cluster three-way data. Clustering of gene expression data is carried out as means of discovering gene co-expression networks. RESULTS: In this work, a mixture of matrix variate Poisson-log normal distributions is proposed for clustering read counts from RNA sequencing. By considering the matrix variate structure, full information on the conditions and occasions of the RNA sequencing dataset is simultaneously considered, and the number of covariance parameters to be estimated is reduced. We propose three different frameworks for parameter estimation: a Markov chain Monte Carlo-based approach, a variational Gaussian approximation-based approach, and a hybrid approach. Various information criteria are used for model selection. The models are applied to both real and simulated data, and we demonstrate that the proposed approaches can recover the underlying cluster structure in both cases. In simulation studies where the true model parameters are known, our proposed approach shows good parameter recovery. AVAILABILITY AND IMPLEMENTATION: The GitHub R package for this work is available at https://github.com/anjalisilva/mixMVPLN and is released under the open source MIT license. |
format | Online Article Text |
id | pubmed-10159656 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-101596562023-05-05 Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data Silva, Anjali Qin, Xiaoke Rothstein, Steven J McNicholas, Paul D Subedi, Sanjeena Bioinformatics Original Paper MOTIVATION: Three-way data structures, characterized by three entities, the units, the variables and the occasions, are frequent in biological studies. In RNA sequencing, three-way data structures are obtained when high-throughput transcriptome sequencing data are collected for n genes across p conditions at r occasions. Matrix variate distributions offer a natural way to model three-way data and mixtures of matrix variate distributions can be used to cluster three-way data. Clustering of gene expression data is carried out as means of discovering gene co-expression networks. RESULTS: In this work, a mixture of matrix variate Poisson-log normal distributions is proposed for clustering read counts from RNA sequencing. By considering the matrix variate structure, full information on the conditions and occasions of the RNA sequencing dataset is simultaneously considered, and the number of covariance parameters to be estimated is reduced. We propose three different frameworks for parameter estimation: a Markov chain Monte Carlo-based approach, a variational Gaussian approximation-based approach, and a hybrid approach. Various information criteria are used for model selection. The models are applied to both real and simulated data, and we demonstrate that the proposed approaches can recover the underlying cluster structure in both cases. In simulation studies where the true model parameters are known, our proposed approach shows good parameter recovery. AVAILABILITY AND IMPLEMENTATION: The GitHub R package for this work is available at https://github.com/anjalisilva/mixMVPLN and is released under the open source MIT license. Oxford University Press 2023-04-05 /pmc/articles/PMC10159656/ /pubmed/37018147 http://dx.doi.org/10.1093/bioinformatics/btad167 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Paper Silva, Anjali Qin, Xiaoke Rothstein, Steven J McNicholas, Paul D Subedi, Sanjeena Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data |
title | Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data |
title_full | Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data |
title_fullStr | Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data |
title_full_unstemmed | Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data |
title_short | Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data |
title_sort | finite mixtures of matrix variate poisson-log normal distributions for three-way count data |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10159656/ https://www.ncbi.nlm.nih.gov/pubmed/37018147 http://dx.doi.org/10.1093/bioinformatics/btad167 |
work_keys_str_mv | AT silvaanjali finitemixturesofmatrixvariatepoissonlognormaldistributionsforthreewaycountdata AT qinxiaoke finitemixturesofmatrixvariatepoissonlognormaldistributionsforthreewaycountdata AT rothsteinstevenj finitemixturesofmatrixvariatepoissonlognormaldistributionsforthreewaycountdata AT mcnicholaspauld finitemixturesofmatrixvariatepoissonlognormaldistributionsforthreewaycountdata AT subedisanjeena finitemixturesofmatrixvariatepoissonlognormaldistributionsforthreewaycountdata |