Cargando…

Stochastic model search with binary outcomes for genome-wide association studies

OBJECTIVE: The spread of case–control genome-wide association studies (GWASs) has stimulated the development of new variable selection methods and predictive models. We introduce a novel Bayesian model search algorithm, Binary Outcome Stochastic Search (BOSS), which addresses the model selection pro...

Descripción completa

Detalles Bibliográficos
Autores principales: Russu, Alberto, Malovini, Alberto, Puca, Annibale A, Bellazzi, Riccardo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Group 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3392850/
https://www.ncbi.nlm.nih.gov/pubmed/22534080
http://dx.doi.org/10.1136/amiajnl-2011-000741
_version_ 1782237658564526080
author Russu, Alberto
Malovini, Alberto
Puca, Annibale A
Bellazzi, Riccardo
author_facet Russu, Alberto
Malovini, Alberto
Puca, Annibale A
Bellazzi, Riccardo
author_sort Russu, Alberto
collection PubMed
description OBJECTIVE: The spread of case–control genome-wide association studies (GWASs) has stimulated the development of new variable selection methods and predictive models. We introduce a novel Bayesian model search algorithm, Binary Outcome Stochastic Search (BOSS), which addresses the model selection problem when the number of predictors far exceeds the number of binary responses. MATERIALS AND METHODS: Our method is based on a latent variable model that links the observed outcomes to the underlying genetic variables. A Markov Chain Monte Carlo approach is used for model search and to evaluate the posterior probability of each predictor. RESULTS: BOSS is compared with three established methods (stepwise regression, logistic lasso, and elastic net) in a simulated benchmark. Two real case studies are also investigated: a GWAS on the genetic bases of longevity, and the type 2 diabetes study from the Wellcome Trust Case Control Consortium. Simulations show that BOSS achieves higher precisions than the reference methods while preserving good recall rates. In both experimental studies, BOSS successfully detects genetic polymorphisms previously reported to be associated with the analyzed phenotypes. DISCUSSION: BOSS outperforms the other methods in terms of F-measure on simulated data. In the two real studies, BOSS successfully detects biologically relevant features, some of which are missed by univariate analysis and the three reference techniques. CONCLUSION: The proposed algorithm is an advance in the methodology for model selection with a large number of features. Our simulated and experimental results showed that BOSS proves effective in detecting relevant markers while providing a parsimonious model.
format Online
Article
Text
id pubmed-3392850
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BMJ Group
record_format MEDLINE/PubMed
spelling pubmed-33928502012-07-10 Stochastic model search with binary outcomes for genome-wide association studies Russu, Alberto Malovini, Alberto Puca, Annibale A Bellazzi, Riccardo J Am Med Inform Assoc Research and Applications OBJECTIVE: The spread of case–control genome-wide association studies (GWASs) has stimulated the development of new variable selection methods and predictive models. We introduce a novel Bayesian model search algorithm, Binary Outcome Stochastic Search (BOSS), which addresses the model selection problem when the number of predictors far exceeds the number of binary responses. MATERIALS AND METHODS: Our method is based on a latent variable model that links the observed outcomes to the underlying genetic variables. A Markov Chain Monte Carlo approach is used for model search and to evaluate the posterior probability of each predictor. RESULTS: BOSS is compared with three established methods (stepwise regression, logistic lasso, and elastic net) in a simulated benchmark. Two real case studies are also investigated: a GWAS on the genetic bases of longevity, and the type 2 diabetes study from the Wellcome Trust Case Control Consortium. Simulations show that BOSS achieves higher precisions than the reference methods while preserving good recall rates. In both experimental studies, BOSS successfully detects genetic polymorphisms previously reported to be associated with the analyzed phenotypes. DISCUSSION: BOSS outperforms the other methods in terms of F-measure on simulated data. In the two real studies, BOSS successfully detects biologically relevant features, some of which are missed by univariate analysis and the three reference techniques. CONCLUSION: The proposed algorithm is an advance in the methodology for model selection with a large number of features. Our simulated and experimental results showed that BOSS proves effective in detecting relevant markers while providing a parsimonious model. BMJ Group 2012-06 /pmc/articles/PMC3392850/ /pubmed/22534080 http://dx.doi.org/10.1136/amiajnl-2011-000741 Text en © 2012, Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.
spellingShingle Research and Applications
Russu, Alberto
Malovini, Alberto
Puca, Annibale A
Bellazzi, Riccardo
Stochastic model search with binary outcomes for genome-wide association studies
title Stochastic model search with binary outcomes for genome-wide association studies
title_full Stochastic model search with binary outcomes for genome-wide association studies
title_fullStr Stochastic model search with binary outcomes for genome-wide association studies
title_full_unstemmed Stochastic model search with binary outcomes for genome-wide association studies
title_short Stochastic model search with binary outcomes for genome-wide association studies
title_sort stochastic model search with binary outcomes for genome-wide association studies
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3392850/
https://www.ncbi.nlm.nih.gov/pubmed/22534080
http://dx.doi.org/10.1136/amiajnl-2011-000741
work_keys_str_mv AT russualberto stochasticmodelsearchwithbinaryoutcomesforgenomewideassociationstudies
AT malovinialberto stochasticmodelsearchwithbinaryoutcomesforgenomewideassociationstudies
AT pucaannibalea stochasticmodelsearchwithbinaryoutcomesforgenomewideassociationstudies
AT bellazziriccardo stochasticmodelsearchwithbinaryoutcomesforgenomewideassociationstudies