Cargando…
Stochastic model search with binary outcomes for genome-wide association studies
OBJECTIVE: The spread of case–control genome-wide association studies (GWASs) has stimulated the development of new variable selection methods and predictive models. We introduce a novel Bayesian model search algorithm, Binary Outcome Stochastic Search (BOSS), which addresses the model selection pro...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BMJ Group
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3392850/ https://www.ncbi.nlm.nih.gov/pubmed/22534080 http://dx.doi.org/10.1136/amiajnl-2011-000741 |
_version_ | 1782237658564526080 |
---|---|
author | Russu, Alberto Malovini, Alberto Puca, Annibale A Bellazzi, Riccardo |
author_facet | Russu, Alberto Malovini, Alberto Puca, Annibale A Bellazzi, Riccardo |
author_sort | Russu, Alberto |
collection | PubMed |
description | OBJECTIVE: The spread of case–control genome-wide association studies (GWASs) has stimulated the development of new variable selection methods and predictive models. We introduce a novel Bayesian model search algorithm, Binary Outcome Stochastic Search (BOSS), which addresses the model selection problem when the number of predictors far exceeds the number of binary responses. MATERIALS AND METHODS: Our method is based on a latent variable model that links the observed outcomes to the underlying genetic variables. A Markov Chain Monte Carlo approach is used for model search and to evaluate the posterior probability of each predictor. RESULTS: BOSS is compared with three established methods (stepwise regression, logistic lasso, and elastic net) in a simulated benchmark. Two real case studies are also investigated: a GWAS on the genetic bases of longevity, and the type 2 diabetes study from the Wellcome Trust Case Control Consortium. Simulations show that BOSS achieves higher precisions than the reference methods while preserving good recall rates. In both experimental studies, BOSS successfully detects genetic polymorphisms previously reported to be associated with the analyzed phenotypes. DISCUSSION: BOSS outperforms the other methods in terms of F-measure on simulated data. In the two real studies, BOSS successfully detects biologically relevant features, some of which are missed by univariate analysis and the three reference techniques. CONCLUSION: The proposed algorithm is an advance in the methodology for model selection with a large number of features. Our simulated and experimental results showed that BOSS proves effective in detecting relevant markers while providing a parsimonious model. |
format | Online Article Text |
id | pubmed-3392850 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BMJ Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-33928502012-07-10 Stochastic model search with binary outcomes for genome-wide association studies Russu, Alberto Malovini, Alberto Puca, Annibale A Bellazzi, Riccardo J Am Med Inform Assoc Research and Applications OBJECTIVE: The spread of case–control genome-wide association studies (GWASs) has stimulated the development of new variable selection methods and predictive models. We introduce a novel Bayesian model search algorithm, Binary Outcome Stochastic Search (BOSS), which addresses the model selection problem when the number of predictors far exceeds the number of binary responses. MATERIALS AND METHODS: Our method is based on a latent variable model that links the observed outcomes to the underlying genetic variables. A Markov Chain Monte Carlo approach is used for model search and to evaluate the posterior probability of each predictor. RESULTS: BOSS is compared with three established methods (stepwise regression, logistic lasso, and elastic net) in a simulated benchmark. Two real case studies are also investigated: a GWAS on the genetic bases of longevity, and the type 2 diabetes study from the Wellcome Trust Case Control Consortium. Simulations show that BOSS achieves higher precisions than the reference methods while preserving good recall rates. In both experimental studies, BOSS successfully detects genetic polymorphisms previously reported to be associated with the analyzed phenotypes. DISCUSSION: BOSS outperforms the other methods in terms of F-measure on simulated data. In the two real studies, BOSS successfully detects biologically relevant features, some of which are missed by univariate analysis and the three reference techniques. CONCLUSION: The proposed algorithm is an advance in the methodology for model selection with a large number of features. Our simulated and experimental results showed that BOSS proves effective in detecting relevant markers while providing a parsimonious model. BMJ Group 2012-06 /pmc/articles/PMC3392850/ /pubmed/22534080 http://dx.doi.org/10.1136/amiajnl-2011-000741 Text en © 2012, Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode. |
spellingShingle | Research and Applications Russu, Alberto Malovini, Alberto Puca, Annibale A Bellazzi, Riccardo Stochastic model search with binary outcomes for genome-wide association studies |
title | Stochastic model search with binary outcomes for genome-wide association studies |
title_full | Stochastic model search with binary outcomes for genome-wide association studies |
title_fullStr | Stochastic model search with binary outcomes for genome-wide association studies |
title_full_unstemmed | Stochastic model search with binary outcomes for genome-wide association studies |
title_short | Stochastic model search with binary outcomes for genome-wide association studies |
title_sort | stochastic model search with binary outcomes for genome-wide association studies |
topic | Research and Applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3392850/ https://www.ncbi.nlm.nih.gov/pubmed/22534080 http://dx.doi.org/10.1136/amiajnl-2011-000741 |
work_keys_str_mv | AT russualberto stochasticmodelsearchwithbinaryoutcomesforgenomewideassociationstudies AT malovinialberto stochasticmodelsearchwithbinaryoutcomesforgenomewideassociationstudies AT pucaannibalea stochasticmodelsearchwithbinaryoutcomesforgenomewideassociationstudies AT bellazziriccardo stochasticmodelsearchwithbinaryoutcomesforgenomewideassociationstudies |