Cargando…

BADGE: A novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data

BACKGROUND: Recent advances in RNA sequencing (RNA-Seq) technology have offered unprecedented scope and resolution for transcriptome analysis. However, precise quantification of mRNA abundance and identification of differentially expressed genes are complicated due to biological and technical variat...

Descripción completa

Detalles Bibliográficos
Autores principales: Gu, Jinghua, Wang, Xiao, Halakivi-Clarke, Leena, Clarke, Robert, Xuan, Jianhua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4168709/
https://www.ncbi.nlm.nih.gov/pubmed/25252852
http://dx.doi.org/10.1186/1471-2105-15-S9-S6
_version_ 1782335603687292928
author Gu, Jinghua
Wang, Xiao
Halakivi-Clarke, Leena
Clarke, Robert
Xuan, Jianhua
author_facet Gu, Jinghua
Wang, Xiao
Halakivi-Clarke, Leena
Clarke, Robert
Xuan, Jianhua
author_sort Gu, Jinghua
collection PubMed
description BACKGROUND: Recent advances in RNA sequencing (RNA-Seq) technology have offered unprecedented scope and resolution for transcriptome analysis. However, precise quantification of mRNA abundance and identification of differentially expressed genes are complicated due to biological and technical variations in RNA-Seq data. RESULTS: We systematically study the variation in count data and dissect the sources of variation into between-sample variation and within-sample variation. A novel Bayesian framework is developed for joint estimate of gene level mRNA abundance and differential state, which models the intrinsic variability in RNA-Seq to improve the estimation. Specifically, a Poisson-Lognormal model is incorporated into the Bayesian framework to model within-sample variation; a Gamma-Gamma model is then used to model between-sample variation, which accounts for over-dispersion of read counts among multiple samples. Simulation studies, where sequencing counts are synthesized based on parameters learned from real datasets, have demonstrated the advantage of the proposed method in both quantification of mRNA abundance and identification of differentially expressed genes. Moreover, performance comparison on data from the Sequencing Quality Control (SEQC) Project with ERCC spike-in controls has shown that the proposed method outperforms existing RNA-Seq methods in differential analysis. Application on breast cancer dataset has further illustrated that the proposed Bayesian model can 'blindly' estimate sources of variation caused by sequencing biases. CONCLUSIONS: We have developed a novel Bayesian hierarchical approach to investigate within-sample and between-sample variations in RNA-Seq data. Simulation and real data applications have validated desirable performance of the proposed method. The software package is available at http://www.cbil.ece.vt.edu/software.htm.
format Online
Article
Text
id pubmed-4168709
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41687092014-10-02 BADGE: A novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data Gu, Jinghua Wang, Xiao Halakivi-Clarke, Leena Clarke, Robert Xuan, Jianhua BMC Bioinformatics Proceedings BACKGROUND: Recent advances in RNA sequencing (RNA-Seq) technology have offered unprecedented scope and resolution for transcriptome analysis. However, precise quantification of mRNA abundance and identification of differentially expressed genes are complicated due to biological and technical variations in RNA-Seq data. RESULTS: We systematically study the variation in count data and dissect the sources of variation into between-sample variation and within-sample variation. A novel Bayesian framework is developed for joint estimate of gene level mRNA abundance and differential state, which models the intrinsic variability in RNA-Seq to improve the estimation. Specifically, a Poisson-Lognormal model is incorporated into the Bayesian framework to model within-sample variation; a Gamma-Gamma model is then used to model between-sample variation, which accounts for over-dispersion of read counts among multiple samples. Simulation studies, where sequencing counts are synthesized based on parameters learned from real datasets, have demonstrated the advantage of the proposed method in both quantification of mRNA abundance and identification of differentially expressed genes. Moreover, performance comparison on data from the Sequencing Quality Control (SEQC) Project with ERCC spike-in controls has shown that the proposed method outperforms existing RNA-Seq methods in differential analysis. Application on breast cancer dataset has further illustrated that the proposed Bayesian model can 'blindly' estimate sources of variation caused by sequencing biases. CONCLUSIONS: We have developed a novel Bayesian hierarchical approach to investigate within-sample and between-sample variations in RNA-Seq data. Simulation and real data applications have validated desirable performance of the proposed method. The software package is available at http://www.cbil.ece.vt.edu/software.htm. BioMed Central 2014-09-10 /pmc/articles/PMC4168709/ /pubmed/25252852 http://dx.doi.org/10.1186/1471-2105-15-S9-S6 Text en Copyright © 2014 Gu et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Gu, Jinghua
Wang, Xiao
Halakivi-Clarke, Leena
Clarke, Robert
Xuan, Jianhua
BADGE: A novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data
title BADGE: A novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data
title_full BADGE: A novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data
title_fullStr BADGE: A novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data
title_full_unstemmed BADGE: A novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data
title_short BADGE: A novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data
title_sort badge: a novel bayesian model for accurate abundance quantification and differential analysis of rna-seq data
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4168709/
https://www.ncbi.nlm.nih.gov/pubmed/25252852
http://dx.doi.org/10.1186/1471-2105-15-S9-S6
work_keys_str_mv AT gujinghua badgeanovelbayesianmodelforaccurateabundancequantificationanddifferentialanalysisofrnaseqdata
AT wangxiao badgeanovelbayesianmodelforaccurateabundancequantificationanddifferentialanalysisofrnaseqdata
AT halakiviclarkeleena badgeanovelbayesianmodelforaccurateabundancequantificationanddifferentialanalysisofrnaseqdata
AT clarkerobert badgeanovelbayesianmodelforaccurateabundancequantificationanddifferentialanalysisofrnaseqdata
AT xuanjianhua badgeanovelbayesianmodelforaccurateabundancequantificationanddifferentialanalysisofrnaseqdata