Cargando…

Covariate-dependent negative binomial factor analysis of RNA sequencing data

MOTIVATION: High-throughput sequencing technologies, in particular RNA sequencing (RNA-seq), have become the basic practice for genomic studies in biomedical research. In addition to studying genes individually, for example, through differential expression analysis, investigating co-ordinated expres...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zamani Dadaneh, Siamak, Zhou, Mingyuan, Qian, Xiaoning
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2018
Materias:	Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022606/ https://www.ncbi.nlm.nih.gov/pubmed/29949981 http://dx.doi.org/10.1093/bioinformatics/bty237

_version_	1783335714111356928
author	Zamani Dadaneh, Siamak Zhou, Mingyuan Qian, Xiaoning
author_facet	Zamani Dadaneh, Siamak Zhou, Mingyuan Qian, Xiaoning
author_sort	Zamani Dadaneh, Siamak
collection	PubMed
description	MOTIVATION: High-throughput sequencing technologies, in particular RNA sequencing (RNA-seq), have become the basic practice for genomic studies in biomedical research. In addition to studying genes individually, for example, through differential expression analysis, investigating co-ordinated expression variations of genes may help reveal the underlying cellular mechanisms to derive better understanding and more effective prognosis and intervention strategies. Although there exists a variety of co-expression network based methods to analyze microarray data for this purpose, instead of blindly extending these methods for microarray data that may introduce unnecessary bias, it is crucial to develop methods well adapted to RNA-seq data to identify the functional modules of genes with similar expression patterns. RESULTS: We have developed a fully Bayesian covariate-dependent negative binomial factor analysis (dNBFA) method—dNBFA—for RNA-seq count data, to capture coordinated gene expression changes, while considering effects from covariates reflecting different influencing factors. Unlike existing co-expression network based methods, our proposed model does not require multiple ad-hoc choices on data processing, transformation, as well as co-expression measures and can be directly applied to RNA-seq data. Furthermore, being capable of incorporating covariate information, the proposed method can tackle setups with complex confounding factors in different experiment designs. Finally, the natural model parameterization removes the need for a normalization preprocessing step, as commonly adopted to compensate for the effect of sequencing-depth variations. Efficient Bayesian inference of model parameters is derived by exploiting conditional conjugacy via novel data augmentation techniques. Experimental results on several real-world RNA-seq datasets on complex diseases suggest dNBFA as a powerful tool for discovering the gene modules with significant differential expression and meaningful biological insight. AVAILABILITY AND IMPLEMENTATION: dNBFA is implemented in R language and is available at https://github.com/siamakz/dNBFA.
format	Online Article Text
id	pubmed-6022606
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-60226062018-07-10 Covariate-dependent negative binomial factor analysis of RNA sequencing data Zamani Dadaneh, Siamak Zhou, Mingyuan Qian, Xiaoning Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: High-throughput sequencing technologies, in particular RNA sequencing (RNA-seq), have become the basic practice for genomic studies in biomedical research. In addition to studying genes individually, for example, through differential expression analysis, investigating co-ordinated expression variations of genes may help reveal the underlying cellular mechanisms to derive better understanding and more effective prognosis and intervention strategies. Although there exists a variety of co-expression network based methods to analyze microarray data for this purpose, instead of blindly extending these methods for microarray data that may introduce unnecessary bias, it is crucial to develop methods well adapted to RNA-seq data to identify the functional modules of genes with similar expression patterns. RESULTS: We have developed a fully Bayesian covariate-dependent negative binomial factor analysis (dNBFA) method—dNBFA—for RNA-seq count data, to capture coordinated gene expression changes, while considering effects from covariates reflecting different influencing factors. Unlike existing co-expression network based methods, our proposed model does not require multiple ad-hoc choices on data processing, transformation, as well as co-expression measures and can be directly applied to RNA-seq data. Furthermore, being capable of incorporating covariate information, the proposed method can tackle setups with complex confounding factors in different experiment designs. Finally, the natural model parameterization removes the need for a normalization preprocessing step, as commonly adopted to compensate for the effect of sequencing-depth variations. Efficient Bayesian inference of model parameters is derived by exploiting conditional conjugacy via novel data augmentation techniques. Experimental results on several real-world RNA-seq datasets on complex diseases suggest dNBFA as a powerful tool for discovering the gene modules with significant differential expression and meaningful biological insight. AVAILABILITY AND IMPLEMENTATION: dNBFA is implemented in R language and is available at https://github.com/siamakz/dNBFA. Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6022606/ /pubmed/29949981 http://dx.doi.org/10.1093/bioinformatics/bty237 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Ismb 2018–Intelligent Systems for Molecular Biology Proceedings Zamani Dadaneh, Siamak Zhou, Mingyuan Qian, Xiaoning Covariate-dependent negative binomial factor analysis of RNA sequencing data
title	Covariate-dependent negative binomial factor analysis of RNA sequencing data
title_full	Covariate-dependent negative binomial factor analysis of RNA sequencing data
title_fullStr	Covariate-dependent negative binomial factor analysis of RNA sequencing data
title_full_unstemmed	Covariate-dependent negative binomial factor analysis of RNA sequencing data
title_short	Covariate-dependent negative binomial factor analysis of RNA sequencing data
title_sort	covariate-dependent negative binomial factor analysis of rna sequencing data
topic	Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022606/ https://www.ncbi.nlm.nih.gov/pubmed/29949981 http://dx.doi.org/10.1093/bioinformatics/bty237
work_keys_str_mv	AT zamanidadanehsiamak covariatedependentnegativebinomialfactoranalysisofrnasequencingdata AT zhoumingyuan covariatedependentnegativebinomialfactoranalysisofrnasequencingdata AT qianxiaoning covariatedependentnegativebinomialfactoranalysisofrnasequencingdata

Covariate-dependent negative binomial factor analysis of RNA sequencing data

Ejemplares similares