Cargando…

A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data

BACKGROUND: High-dimensional data of discrete and skewed nature is commonly encountered in high-throughput sequencing studies. Analyzing the network itself or the interplay between genes in this type of data continues to present many challenges. As data visualization techniques become cumbersome for...

Descripción completa

Detalles Bibliográficos
Autores principales: Silva, Anjali, Rothstein, Steven J., McNicholas, Paul D., Subedi, Sanjeena
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6636065/
https://www.ncbi.nlm.nih.gov/pubmed/31311497
http://dx.doi.org/10.1186/s12859-019-2916-0
_version_ 1783435999037095936
author Silva, Anjali
Rothstein, Steven J.
McNicholas, Paul D.
Subedi, Sanjeena
author_facet Silva, Anjali
Rothstein, Steven J.
McNicholas, Paul D.
Subedi, Sanjeena
author_sort Silva, Anjali
collection PubMed
description BACKGROUND: High-dimensional data of discrete and skewed nature is commonly encountered in high-throughput sequencing studies. Analyzing the network itself or the interplay between genes in this type of data continues to present many challenges. As data visualization techniques become cumbersome for higher dimensions and unconvincing when there is no clear separation between homogeneous subgroups within the data, cluster analysis provides an intuitive alternative. The aim of applying mixture model-based clustering in this context is to discover groups of co-expressed genes, which can shed light on biological functions and pathways of gene products. RESULTS: A mixture of multivariate Poisson-log normal (MPLN) model is developed for clustering of high-throughput transcriptome sequencing data. Parameter estimation is carried out using a Markov chain Monte Carlo expectation-maximization (MCMC-EM) algorithm, and information criteria are used for model selection. CONCLUSIONS: The mixture of MPLN model is able to fit a wide range of correlation and overdispersion situations, and is suited for modeling multivariate count data from RNA sequencing studies. All scripts used for implementing the method can be found at https://github.com/anjalisilva/MPLNClust. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2916-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6636065
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66360652019-07-25 A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data Silva, Anjali Rothstein, Steven J. McNicholas, Paul D. Subedi, Sanjeena BMC Bioinformatics Methodology Article BACKGROUND: High-dimensional data of discrete and skewed nature is commonly encountered in high-throughput sequencing studies. Analyzing the network itself or the interplay between genes in this type of data continues to present many challenges. As data visualization techniques become cumbersome for higher dimensions and unconvincing when there is no clear separation between homogeneous subgroups within the data, cluster analysis provides an intuitive alternative. The aim of applying mixture model-based clustering in this context is to discover groups of co-expressed genes, which can shed light on biological functions and pathways of gene products. RESULTS: A mixture of multivariate Poisson-log normal (MPLN) model is developed for clustering of high-throughput transcriptome sequencing data. Parameter estimation is carried out using a Markov chain Monte Carlo expectation-maximization (MCMC-EM) algorithm, and information criteria are used for model selection. CONCLUSIONS: The mixture of MPLN model is able to fit a wide range of correlation and overdispersion situations, and is suited for modeling multivariate count data from RNA sequencing studies. All scripts used for implementing the method can be found at https://github.com/anjalisilva/MPLNClust. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2916-0) contains supplementary material, which is available to authorized users. BioMed Central 2019-07-16 /pmc/articles/PMC6636065/ /pubmed/31311497 http://dx.doi.org/10.1186/s12859-019-2916-0 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Silva, Anjali
Rothstein, Steven J.
McNicholas, Paul D.
Subedi, Sanjeena
A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data
title A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data
title_full A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data
title_fullStr A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data
title_full_unstemmed A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data
title_short A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data
title_sort multivariate poisson-log normal mixture model for clustering transcriptome sequencing data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6636065/
https://www.ncbi.nlm.nih.gov/pubmed/31311497
http://dx.doi.org/10.1186/s12859-019-2916-0
work_keys_str_mv AT silvaanjali amultivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata
AT rothsteinstevenj amultivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata
AT mcnicholaspauld amultivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata
AT subedisanjeena amultivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata
AT silvaanjali multivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata
AT rothsteinstevenj multivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata
AT mcnicholaspauld multivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata
AT subedisanjeena multivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata