Cargando…

A Bayesian approach for inducing sparsity in generalized linear models with multi-category response

BACKGROUND: The dimension and complexity of high-throughput gene expression data create many challenges for downstream analysis. Several approaches exist to reduce the number of variables with respect to small sample sizes. In this study, we utilized the Generalized Double Pareto (GDP) prior to indu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Madahian, Behrouz, Roy, Sujoy, Bowman, Dale, Deng, Lih Y, Homayouni, Ramin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4597416/ https://www.ncbi.nlm.nih.gov/pubmed/26423345 http://dx.doi.org/10.1186/1471-2105-16-S13-S13

_version_	1782393920388333568
author	Madahian, Behrouz Roy, Sujoy Bowman, Dale Deng, Lih Y Homayouni, Ramin
author_facet	Madahian, Behrouz Roy, Sujoy Bowman, Dale Deng, Lih Y Homayouni, Ramin
author_sort	Madahian, Behrouz
collection	PubMed
description	BACKGROUND: The dimension and complexity of high-throughput gene expression data create many challenges for downstream analysis. Several approaches exist to reduce the number of variables with respect to small sample sizes. In this study, we utilized the Generalized Double Pareto (GDP) prior to induce sparsity in a Bayesian Generalized Linear Model (GLM) setting. The approach was evaluated using a publicly available microarray dataset containing 99 samples corresponding to four different prostate cancer subtypes. RESULTS: A hierarchical Sparse Bayesian GLM using GDP prior (SBGG) was developed to take into account the progressive nature of the response variable. We obtained an average overall classification accuracy between 82.5% and 94%, which was higher than Support Vector Machine, Random Forest or a Sparse Bayesian GLM using double exponential priors. Additionally, SBGG outperforms the other 3 methods in correctly identifying pre-metastatic stages of cancer progression, which can prove extremely valuable for therapeutic and diagnostic purposes. Importantly, using Geneset Cohesion Analysis Tool, we found that the top 100 genes produced by SBGG had an average functional cohesion p-value of 2.0E-4 compared to 0.007 to 0.131 produced by the other methods. CONCLUSIONS: Using GDP in a Bayesian GLM model applied to cancer progression data results in better subclass prediction. In particular, the method identifies pre-metastatic stages of prostate cancer with substantially better accuracy and produces more functionally relevant gene sets.
format	Online Article Text
id	pubmed-4597416
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-45974162015-10-08 A Bayesian approach for inducing sparsity in generalized linear models with multi-category response Madahian, Behrouz Roy, Sujoy Bowman, Dale Deng, Lih Y Homayouni, Ramin BMC Bioinformatics Proceedings BACKGROUND: The dimension and complexity of high-throughput gene expression data create many challenges for downstream analysis. Several approaches exist to reduce the number of variables with respect to small sample sizes. In this study, we utilized the Generalized Double Pareto (GDP) prior to induce sparsity in a Bayesian Generalized Linear Model (GLM) setting. The approach was evaluated using a publicly available microarray dataset containing 99 samples corresponding to four different prostate cancer subtypes. RESULTS: A hierarchical Sparse Bayesian GLM using GDP prior (SBGG) was developed to take into account the progressive nature of the response variable. We obtained an average overall classification accuracy between 82.5% and 94%, which was higher than Support Vector Machine, Random Forest or a Sparse Bayesian GLM using double exponential priors. Additionally, SBGG outperforms the other 3 methods in correctly identifying pre-metastatic stages of cancer progression, which can prove extremely valuable for therapeutic and diagnostic purposes. Importantly, using Geneset Cohesion Analysis Tool, we found that the top 100 genes produced by SBGG had an average functional cohesion p-value of 2.0E-4 compared to 0.007 to 0.131 produced by the other methods. CONCLUSIONS: Using GDP in a Bayesian GLM model applied to cancer progression data results in better subclass prediction. In particular, the method identifies pre-metastatic stages of prostate cancer with substantially better accuracy and produces more functionally relevant gene sets. BioMed Central 2015-09-25 /pmc/articles/PMC4597416/ /pubmed/26423345 http://dx.doi.org/10.1186/1471-2105-16-S13-S13 Text en Copyright © 2015 Madahian et al. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Proceedings Madahian, Behrouz Roy, Sujoy Bowman, Dale Deng, Lih Y Homayouni, Ramin A Bayesian approach for inducing sparsity in generalized linear models with multi-category response
title	A Bayesian approach for inducing sparsity in generalized linear models with multi-category response
title_full	A Bayesian approach for inducing sparsity in generalized linear models with multi-category response
title_fullStr	A Bayesian approach for inducing sparsity in generalized linear models with multi-category response
title_full_unstemmed	A Bayesian approach for inducing sparsity in generalized linear models with multi-category response
title_short	A Bayesian approach for inducing sparsity in generalized linear models with multi-category response
title_sort	bayesian approach for inducing sparsity in generalized linear models with multi-category response
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4597416/ https://www.ncbi.nlm.nih.gov/pubmed/26423345 http://dx.doi.org/10.1186/1471-2105-16-S13-S13
work_keys_str_mv	AT madahianbehrouz abayesianapproachforinducingsparsityingeneralizedlinearmodelswithmulticategoryresponse AT roysujoy abayesianapproachforinducingsparsityingeneralizedlinearmodelswithmulticategoryresponse AT bowmandale abayesianapproachforinducingsparsityingeneralizedlinearmodelswithmulticategoryresponse AT denglihy abayesianapproachforinducingsparsityingeneralizedlinearmodelswithmulticategoryresponse AT homayouniramin abayesianapproachforinducingsparsityingeneralizedlinearmodelswithmulticategoryresponse AT madahianbehrouz bayesianapproachforinducingsparsityingeneralizedlinearmodelswithmulticategoryresponse AT roysujoy bayesianapproachforinducingsparsityingeneralizedlinearmodelswithmulticategoryresponse AT bowmandale bayesianapproachforinducingsparsityingeneralizedlinearmodelswithmulticategoryresponse AT denglihy bayesianapproachforinducingsparsityingeneralizedlinearmodelswithmulticategoryresponse AT homayouniramin bayesianapproachforinducingsparsityingeneralizedlinearmodelswithmulticategoryresponse

A Bayesian approach for inducing sparsity in generalized linear models with multi-category response

Ejemplares similares