Cargando…

Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data

MOTIVATION: Three-way data structures, characterized by three entities, the units, the variables and the occasions, are frequent in biological studies. In RNA sequencing, three-way data structures are obtained when high-throughput transcriptome sequencing data are collected for n genes across p cond...

Descripción completa

Detalles Bibliográficos
Autores principales: Silva, Anjali, Qin, Xiaoke, Rothstein, Steven J, McNicholas, Paul D, Subedi, Sanjeena
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10159656/
https://www.ncbi.nlm.nih.gov/pubmed/37018147
http://dx.doi.org/10.1093/bioinformatics/btad167
_version_ 1785037147393228800
author Silva, Anjali
Qin, Xiaoke
Rothstein, Steven J
McNicholas, Paul D
Subedi, Sanjeena
author_facet Silva, Anjali
Qin, Xiaoke
Rothstein, Steven J
McNicholas, Paul D
Subedi, Sanjeena
author_sort Silva, Anjali
collection PubMed
description MOTIVATION: Three-way data structures, characterized by three entities, the units, the variables and the occasions, are frequent in biological studies. In RNA sequencing, three-way data structures are obtained when high-throughput transcriptome sequencing data are collected for n genes across p conditions at r occasions. Matrix variate distributions offer a natural way to model three-way data and mixtures of matrix variate distributions can be used to cluster three-way data. Clustering of gene expression data is carried out as means of discovering gene co-expression networks. RESULTS: In this work, a mixture of matrix variate Poisson-log normal distributions is proposed for clustering read counts from RNA sequencing. By considering the matrix variate structure, full information on the conditions and occasions of the RNA sequencing dataset is simultaneously considered, and the number of covariance parameters to be estimated is reduced. We propose three different frameworks for parameter estimation: a Markov chain Monte Carlo-based approach, a variational Gaussian approximation-based approach, and a hybrid approach. Various information criteria are used for model selection. The models are applied to both real and simulated data, and we demonstrate that the proposed approaches can recover the underlying cluster structure in both cases. In simulation studies where the true model parameters are known, our proposed approach shows good parameter recovery. AVAILABILITY AND IMPLEMENTATION: The GitHub R package for this work is available at https://github.com/anjalisilva/mixMVPLN and is released under the open source MIT license.
format Online
Article
Text
id pubmed-10159656
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-101596562023-05-05 Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data Silva, Anjali Qin, Xiaoke Rothstein, Steven J McNicholas, Paul D Subedi, Sanjeena Bioinformatics Original Paper MOTIVATION: Three-way data structures, characterized by three entities, the units, the variables and the occasions, are frequent in biological studies. In RNA sequencing, three-way data structures are obtained when high-throughput transcriptome sequencing data are collected for n genes across p conditions at r occasions. Matrix variate distributions offer a natural way to model three-way data and mixtures of matrix variate distributions can be used to cluster three-way data. Clustering of gene expression data is carried out as means of discovering gene co-expression networks. RESULTS: In this work, a mixture of matrix variate Poisson-log normal distributions is proposed for clustering read counts from RNA sequencing. By considering the matrix variate structure, full information on the conditions and occasions of the RNA sequencing dataset is simultaneously considered, and the number of covariance parameters to be estimated is reduced. We propose three different frameworks for parameter estimation: a Markov chain Monte Carlo-based approach, a variational Gaussian approximation-based approach, and a hybrid approach. Various information criteria are used for model selection. The models are applied to both real and simulated data, and we demonstrate that the proposed approaches can recover the underlying cluster structure in both cases. In simulation studies where the true model parameters are known, our proposed approach shows good parameter recovery. AVAILABILITY AND IMPLEMENTATION: The GitHub R package for this work is available at https://github.com/anjalisilva/mixMVPLN and is released under the open source MIT license. Oxford University Press 2023-04-05 /pmc/articles/PMC10159656/ /pubmed/37018147 http://dx.doi.org/10.1093/bioinformatics/btad167 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Silva, Anjali
Qin, Xiaoke
Rothstein, Steven J
McNicholas, Paul D
Subedi, Sanjeena
Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data
title Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data
title_full Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data
title_fullStr Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data
title_full_unstemmed Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data
title_short Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data
title_sort finite mixtures of matrix variate poisson-log normal distributions for three-way count data
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10159656/
https://www.ncbi.nlm.nih.gov/pubmed/37018147
http://dx.doi.org/10.1093/bioinformatics/btad167
work_keys_str_mv AT silvaanjali finitemixturesofmatrixvariatepoissonlognormaldistributionsforthreewaycountdata
AT qinxiaoke finitemixturesofmatrixvariatepoissonlognormaldistributionsforthreewaycountdata
AT rothsteinstevenj finitemixturesofmatrixvariatepoissonlognormaldistributionsforthreewaycountdata
AT mcnicholaspauld finitemixturesofmatrixvariatepoissonlognormaldistributionsforthreewaycountdata
AT subedisanjeena finitemixturesofmatrixvariatepoissonlognormaldistributionsforthreewaycountdata