Cargando…

Bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors

Motivation: The advent of new genomic technologies has resulted in the production of massive data sets. Analyses of these data require new statistical and computational methods. In this article, we propose one such method that is useful in selecting explanatory variables for prediction of a binary r...

Descripción completa

Detalles Bibliográficos
Autores principales: Nikooienejad, Amir, Wang, Wenyi, Johnson, Valen E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4848399/
https://www.ncbi.nlm.nih.gov/pubmed/26740524
http://dx.doi.org/10.1093/bioinformatics/btv764
_version_ 1782429334621913088
author Nikooienejad, Amir
Wang, Wenyi
Johnson, Valen E.
author_facet Nikooienejad, Amir
Wang, Wenyi
Johnson, Valen E.
author_sort Nikooienejad, Amir
collection PubMed
description Motivation: The advent of new genomic technologies has resulted in the production of massive data sets. Analyses of these data require new statistical and computational methods. In this article, we propose one such method that is useful in selecting explanatory variables for prediction of a binary response. Although this problem has recently been addressed using penalized likelihood methods, we adopt a Bayesian approach that utilizes a mixture of non-local prior densities and point masses on the binary regression coefficient vectors. Results: The resulting method, which we call iMOMLogit, provides improved performance in identifying true models and reducing estimation and prediction error in a number of simulation studies. More importantly, its application to several genomic datasets produces predictions that have high accuracy using far fewer explanatory variables than competing methods. We also describe a novel approach for setting prior hyperparameters by examining the total variation distance between the prior distributions on the regression parameters and the distribution of the maximum likelihood estimator under the null distribution. Finally, we describe a computational algorithm that can be used to implement iMOMLogit in ultrahigh-dimensional settings ([Formula: see text]) and provide diagnostics to assess the probability that this algorithm has identified the highest posterior probability model. Availability and implementation: Software to implement this method can be downloaded at: http://www.stat.tamu.edu/∼amir/code.html. Contact: wwang7@mdanderson.org or vjohnson@stat.tamu.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4848399
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-48483992016-04-29 Bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors Nikooienejad, Amir Wang, Wenyi Johnson, Valen E. Bioinformatics Original Papers Motivation: The advent of new genomic technologies has resulted in the production of massive data sets. Analyses of these data require new statistical and computational methods. In this article, we propose one such method that is useful in selecting explanatory variables for prediction of a binary response. Although this problem has recently been addressed using penalized likelihood methods, we adopt a Bayesian approach that utilizes a mixture of non-local prior densities and point masses on the binary regression coefficient vectors. Results: The resulting method, which we call iMOMLogit, provides improved performance in identifying true models and reducing estimation and prediction error in a number of simulation studies. More importantly, its application to several genomic datasets produces predictions that have high accuracy using far fewer explanatory variables than competing methods. We also describe a novel approach for setting prior hyperparameters by examining the total variation distance between the prior distributions on the regression parameters and the distribution of the maximum likelihood estimator under the null distribution. Finally, we describe a computational algorithm that can be used to implement iMOMLogit in ultrahigh-dimensional settings ([Formula: see text]) and provide diagnostics to assess the probability that this algorithm has identified the highest posterior probability model. Availability and implementation: Software to implement this method can be downloaded at: http://www.stat.tamu.edu/∼amir/code.html. Contact: wwang7@mdanderson.org or vjohnson@stat.tamu.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-05-01 2016-01-06 /pmc/articles/PMC4848399/ /pubmed/26740524 http://dx.doi.org/10.1093/bioinformatics/btv764 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Nikooienejad, Amir
Wang, Wenyi
Johnson, Valen E.
Bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors
title Bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors
title_full Bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors
title_fullStr Bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors
title_full_unstemmed Bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors
title_short Bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors
title_sort bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4848399/
https://www.ncbi.nlm.nih.gov/pubmed/26740524
http://dx.doi.org/10.1093/bioinformatics/btv764
work_keys_str_mv AT nikooienejadamir bayesianvariableselectionforbinaryoutcomesinhighdimensionalgenomicstudiesusingnonlocalpriors
AT wangwenyi bayesianvariableselectionforbinaryoutcomesinhighdimensionalgenomicstudiesusingnonlocalpriors
AT johnsonvalene bayesianvariableselectionforbinaryoutcomesinhighdimensionalgenomicstudiesusingnonlocalpriors