Cargando…

A scalable, knowledge-based analysis framework for genetic association studies

BACKGROUND: Testing for marginal associations between numerous genetic variants and disease may miss complex relationships among variables (e.g., gene-gene interactions). Bayesian approaches can model multiple variables together and offer advantages over conventional model building strategies, inclu...

Descripción completa

Detalles Bibliográficos
Autores principales: Baurley, James W, Conti, David V
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015032/
https://www.ncbi.nlm.nih.gov/pubmed/24152222
http://dx.doi.org/10.1186/1471-2105-14-312
_version_ 1782315275593449472
author Baurley, James W
Conti, David V
author_facet Baurley, James W
Conti, David V
author_sort Baurley, James W
collection PubMed
description BACKGROUND: Testing for marginal associations between numerous genetic variants and disease may miss complex relationships among variables (e.g., gene-gene interactions). Bayesian approaches can model multiple variables together and offer advantages over conventional model building strategies, including using existing biological evidence as modeling priors and acknowledging that many models may fit the data well. With many candidate variables, Bayesian approaches to variable selection rely on algorithms to approximate the posterior distribution of models, such as Markov-Chain Monte Carlo (MCMC). Unfortunately, MCMC is difficult to parallelize and requires many iterations to adequately sample the posterior. We introduce a scalable algorithm called PEAK that improves the efficiency of MCMC by dividing a large set of variables into related groups using a rooted graph that resembles a mountain peak. Our algorithm takes advantage of parallel computing and existing biological databases when available. RESULTS: By using graphs to manage a model space with more than 500,000 candidate variables, we were able to improve MCMC efficiency and uncover the true simulated causal variables, including a gene-gene interaction. We applied PEAK to a case-control study of childhood asthma with 2,521 genetic variants. We used an informative graph for oxidative stress derived from Gene Ontology and identified several variants in ERBB4, OXR1, and BCL2 with strong evidence for associations with childhood asthma. CONCLUSIONS: We introduced an extremely flexible analysis framework capable of efficiently performing Bayesian variable selection on many candidate variables. The PEAK algorithm can be provided with an informative graph, which can be advantageous when considering gene-gene interactions, or a symmetric graph, which simply divides the model space into manageable regions. The PEAK framework is compatible with various model forms, allowing for the algorithm to be configured for different study designs and applications, such as pathway or rare-variant analyses, by simple modifications to the model likelihood and proposal functions.
format Online
Article
Text
id pubmed-4015032
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40150322014-05-23 A scalable, knowledge-based analysis framework for genetic association studies Baurley, James W Conti, David V BMC Bioinformatics Methodology Article BACKGROUND: Testing for marginal associations between numerous genetic variants and disease may miss complex relationships among variables (e.g., gene-gene interactions). Bayesian approaches can model multiple variables together and offer advantages over conventional model building strategies, including using existing biological evidence as modeling priors and acknowledging that many models may fit the data well. With many candidate variables, Bayesian approaches to variable selection rely on algorithms to approximate the posterior distribution of models, such as Markov-Chain Monte Carlo (MCMC). Unfortunately, MCMC is difficult to parallelize and requires many iterations to adequately sample the posterior. We introduce a scalable algorithm called PEAK that improves the efficiency of MCMC by dividing a large set of variables into related groups using a rooted graph that resembles a mountain peak. Our algorithm takes advantage of parallel computing and existing biological databases when available. RESULTS: By using graphs to manage a model space with more than 500,000 candidate variables, we were able to improve MCMC efficiency and uncover the true simulated causal variables, including a gene-gene interaction. We applied PEAK to a case-control study of childhood asthma with 2,521 genetic variants. We used an informative graph for oxidative stress derived from Gene Ontology and identified several variants in ERBB4, OXR1, and BCL2 with strong evidence for associations with childhood asthma. CONCLUSIONS: We introduced an extremely flexible analysis framework capable of efficiently performing Bayesian variable selection on many candidate variables. The PEAK algorithm can be provided with an informative graph, which can be advantageous when considering gene-gene interactions, or a symmetric graph, which simply divides the model space into manageable regions. The PEAK framework is compatible with various model forms, allowing for the algorithm to be configured for different study designs and applications, such as pathway or rare-variant analyses, by simple modifications to the model likelihood and proposal functions. BioMed Central 2013-10-23 /pmc/articles/PMC4015032/ /pubmed/24152222 http://dx.doi.org/10.1186/1471-2105-14-312 Text en Copyright © 2013 Baurley and Conti; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Baurley, James W
Conti, David V
A scalable, knowledge-based analysis framework for genetic association studies
title A scalable, knowledge-based analysis framework for genetic association studies
title_full A scalable, knowledge-based analysis framework for genetic association studies
title_fullStr A scalable, knowledge-based analysis framework for genetic association studies
title_full_unstemmed A scalable, knowledge-based analysis framework for genetic association studies
title_short A scalable, knowledge-based analysis framework for genetic association studies
title_sort scalable, knowledge-based analysis framework for genetic association studies
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015032/
https://www.ncbi.nlm.nih.gov/pubmed/24152222
http://dx.doi.org/10.1186/1471-2105-14-312
work_keys_str_mv AT baurleyjamesw ascalableknowledgebasedanalysisframeworkforgeneticassociationstudies
AT contidavidv ascalableknowledgebasedanalysisframeworkforgeneticassociationstudies
AT baurleyjamesw scalableknowledgebasedanalysisframeworkforgeneticassociationstudies
AT contidavidv scalableknowledgebasedanalysisframeworkforgeneticassociationstudies