Cargando…
A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data
BACKGROUND: High-dimensional data of discrete and skewed nature is commonly encountered in high-throughput sequencing studies. Analyzing the network itself or the interplay between genes in this type of data continues to present many challenges. As data visualization techniques become cumbersome for...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6636065/ https://www.ncbi.nlm.nih.gov/pubmed/31311497 http://dx.doi.org/10.1186/s12859-019-2916-0 |
_version_ | 1783435999037095936 |
---|---|
author | Silva, Anjali Rothstein, Steven J. McNicholas, Paul D. Subedi, Sanjeena |
author_facet | Silva, Anjali Rothstein, Steven J. McNicholas, Paul D. Subedi, Sanjeena |
author_sort | Silva, Anjali |
collection | PubMed |
description | BACKGROUND: High-dimensional data of discrete and skewed nature is commonly encountered in high-throughput sequencing studies. Analyzing the network itself or the interplay between genes in this type of data continues to present many challenges. As data visualization techniques become cumbersome for higher dimensions and unconvincing when there is no clear separation between homogeneous subgroups within the data, cluster analysis provides an intuitive alternative. The aim of applying mixture model-based clustering in this context is to discover groups of co-expressed genes, which can shed light on biological functions and pathways of gene products. RESULTS: A mixture of multivariate Poisson-log normal (MPLN) model is developed for clustering of high-throughput transcriptome sequencing data. Parameter estimation is carried out using a Markov chain Monte Carlo expectation-maximization (MCMC-EM) algorithm, and information criteria are used for model selection. CONCLUSIONS: The mixture of MPLN model is able to fit a wide range of correlation and overdispersion situations, and is suited for modeling multivariate count data from RNA sequencing studies. All scripts used for implementing the method can be found at https://github.com/anjalisilva/MPLNClust. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2916-0) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6636065 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-66360652019-07-25 A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data Silva, Anjali Rothstein, Steven J. McNicholas, Paul D. Subedi, Sanjeena BMC Bioinformatics Methodology Article BACKGROUND: High-dimensional data of discrete and skewed nature is commonly encountered in high-throughput sequencing studies. Analyzing the network itself or the interplay between genes in this type of data continues to present many challenges. As data visualization techniques become cumbersome for higher dimensions and unconvincing when there is no clear separation between homogeneous subgroups within the data, cluster analysis provides an intuitive alternative. The aim of applying mixture model-based clustering in this context is to discover groups of co-expressed genes, which can shed light on biological functions and pathways of gene products. RESULTS: A mixture of multivariate Poisson-log normal (MPLN) model is developed for clustering of high-throughput transcriptome sequencing data. Parameter estimation is carried out using a Markov chain Monte Carlo expectation-maximization (MCMC-EM) algorithm, and information criteria are used for model selection. CONCLUSIONS: The mixture of MPLN model is able to fit a wide range of correlation and overdispersion situations, and is suited for modeling multivariate count data from RNA sequencing studies. All scripts used for implementing the method can be found at https://github.com/anjalisilva/MPLNClust. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2916-0) contains supplementary material, which is available to authorized users. BioMed Central 2019-07-16 /pmc/articles/PMC6636065/ /pubmed/31311497 http://dx.doi.org/10.1186/s12859-019-2916-0 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Silva, Anjali Rothstein, Steven J. McNicholas, Paul D. Subedi, Sanjeena A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data |
title | A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data |
title_full | A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data |
title_fullStr | A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data |
title_full_unstemmed | A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data |
title_short | A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data |
title_sort | multivariate poisson-log normal mixture model for clustering transcriptome sequencing data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6636065/ https://www.ncbi.nlm.nih.gov/pubmed/31311497 http://dx.doi.org/10.1186/s12859-019-2916-0 |
work_keys_str_mv | AT silvaanjali amultivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT rothsteinstevenj amultivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT mcnicholaspauld amultivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT subedisanjeena amultivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT silvaanjali multivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT rothsteinstevenj multivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT mcnicholaspauld multivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT subedisanjeena multivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata |