Cargando…

MCMC implementation of the optimal Bayesian classifier for non-Gaussian models: model-based RNA-Seq classification

BACKGROUND: Sequencing datasets consist of a finite number of reads which map to specific regions of a reference genome. Most effort in modeling these datasets focuses on the detection of univariate differentially expressed genes. However, for classification, we must consider multiple genes and thei...

Descripción completa

Detalles Bibliográficos
Autores principales: Knight, Jason M, Ivanov, Ivan, Dougherty, Edward R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4265360/
https://www.ncbi.nlm.nih.gov/pubmed/25491122
http://dx.doi.org/10.1186/s12859-014-0401-3
_version_ 1782348873896820736
author Knight, Jason M
Ivanov, Ivan
Dougherty, Edward R
author_facet Knight, Jason M
Ivanov, Ivan
Dougherty, Edward R
author_sort Knight, Jason M
collection PubMed
description BACKGROUND: Sequencing datasets consist of a finite number of reads which map to specific regions of a reference genome. Most effort in modeling these datasets focuses on the detection of univariate differentially expressed genes. However, for classification, we must consider multiple genes and their interactions. RESULTS: Thus, we introduce a hierarchical multivariate Poisson model (MP) and the associated optimal Bayesian classifier (OBC) for classifying samples using sequencing data. Lacking closed-form solutions, we employ a Monte Carlo Markov Chain (MCMC) approach to perform classification. We demonstrate superior or equivalent classification performance compared to typical classifiers for two synthetic datasets and over a range of classification problem difficulties. We also introduce the Bayesian minimum mean squared error (MMSE) conditional error estimator and demonstrate its computation over the feature space. In addition, we demonstrate superior or leading class performance over an RNA-Seq dataset containing two lung cancer tumor types from The Cancer Genome Atlas (TCGA). CONCLUSIONS: Through model-based, optimal Bayesian classification, we demonstrate superior classification performance for both synthetic and real RNA-Seq datasets. A tutorial video and Python source code is available under an open source license at http://bit.ly/1gimnss. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0401-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4265360
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42653602014-12-15 MCMC implementation of the optimal Bayesian classifier for non-Gaussian models: model-based RNA-Seq classification Knight, Jason M Ivanov, Ivan Dougherty, Edward R BMC Bioinformatics Research Article BACKGROUND: Sequencing datasets consist of a finite number of reads which map to specific regions of a reference genome. Most effort in modeling these datasets focuses on the detection of univariate differentially expressed genes. However, for classification, we must consider multiple genes and their interactions. RESULTS: Thus, we introduce a hierarchical multivariate Poisson model (MP) and the associated optimal Bayesian classifier (OBC) for classifying samples using sequencing data. Lacking closed-form solutions, we employ a Monte Carlo Markov Chain (MCMC) approach to perform classification. We demonstrate superior or equivalent classification performance compared to typical classifiers for two synthetic datasets and over a range of classification problem difficulties. We also introduce the Bayesian minimum mean squared error (MMSE) conditional error estimator and demonstrate its computation over the feature space. In addition, we demonstrate superior or leading class performance over an RNA-Seq dataset containing two lung cancer tumor types from The Cancer Genome Atlas (TCGA). CONCLUSIONS: Through model-based, optimal Bayesian classification, we demonstrate superior classification performance for both synthetic and real RNA-Seq datasets. A tutorial video and Python source code is available under an open source license at http://bit.ly/1gimnss. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0401-3) contains supplementary material, which is available to authorized users. BioMed Central 2014-12-10 /pmc/articles/PMC4265360/ /pubmed/25491122 http://dx.doi.org/10.1186/s12859-014-0401-3 Text en © Knight et al.; licensee BioMed Central Ltd. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Knight, Jason M
Ivanov, Ivan
Dougherty, Edward R
MCMC implementation of the optimal Bayesian classifier for non-Gaussian models: model-based RNA-Seq classification
title MCMC implementation of the optimal Bayesian classifier for non-Gaussian models: model-based RNA-Seq classification
title_full MCMC implementation of the optimal Bayesian classifier for non-Gaussian models: model-based RNA-Seq classification
title_fullStr MCMC implementation of the optimal Bayesian classifier for non-Gaussian models: model-based RNA-Seq classification
title_full_unstemmed MCMC implementation of the optimal Bayesian classifier for non-Gaussian models: model-based RNA-Seq classification
title_short MCMC implementation of the optimal Bayesian classifier for non-Gaussian models: model-based RNA-Seq classification
title_sort mcmc implementation of the optimal bayesian classifier for non-gaussian models: model-based rna-seq classification
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4265360/
https://www.ncbi.nlm.nih.gov/pubmed/25491122
http://dx.doi.org/10.1186/s12859-014-0401-3
work_keys_str_mv AT knightjasonm mcmcimplementationoftheoptimalbayesianclassifierfornongaussianmodelsmodelbasedrnaseqclassification
AT ivanovivan mcmcimplementationoftheoptimalbayesianclassifierfornongaussianmodelsmodelbasedrnaseqclassification
AT doughertyedwardr mcmcimplementationoftheoptimalbayesianclassifierfornongaussianmodelsmodelbasedrnaseqclassification