Cargando…
A scalable, knowledge-based analysis framework for genetic association studies
BACKGROUND: Testing for marginal associations between numerous genetic variants and disease may miss complex relationships among variables (e.g., gene-gene interactions). Bayesian approaches can model multiple variables together and offer advantages over conventional model building strategies, inclu...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015032/ https://www.ncbi.nlm.nih.gov/pubmed/24152222 http://dx.doi.org/10.1186/1471-2105-14-312 |
_version_ | 1782315275593449472 |
---|---|
author | Baurley, James W Conti, David V |
author_facet | Baurley, James W Conti, David V |
author_sort | Baurley, James W |
collection | PubMed |
description | BACKGROUND: Testing for marginal associations between numerous genetic variants and disease may miss complex relationships among variables (e.g., gene-gene interactions). Bayesian approaches can model multiple variables together and offer advantages over conventional model building strategies, including using existing biological evidence as modeling priors and acknowledging that many models may fit the data well. With many candidate variables, Bayesian approaches to variable selection rely on algorithms to approximate the posterior distribution of models, such as Markov-Chain Monte Carlo (MCMC). Unfortunately, MCMC is difficult to parallelize and requires many iterations to adequately sample the posterior. We introduce a scalable algorithm called PEAK that improves the efficiency of MCMC by dividing a large set of variables into related groups using a rooted graph that resembles a mountain peak. Our algorithm takes advantage of parallel computing and existing biological databases when available. RESULTS: By using graphs to manage a model space with more than 500,000 candidate variables, we were able to improve MCMC efficiency and uncover the true simulated causal variables, including a gene-gene interaction. We applied PEAK to a case-control study of childhood asthma with 2,521 genetic variants. We used an informative graph for oxidative stress derived from Gene Ontology and identified several variants in ERBB4, OXR1, and BCL2 with strong evidence for associations with childhood asthma. CONCLUSIONS: We introduced an extremely flexible analysis framework capable of efficiently performing Bayesian variable selection on many candidate variables. The PEAK algorithm can be provided with an informative graph, which can be advantageous when considering gene-gene interactions, or a symmetric graph, which simply divides the model space into manageable regions. The PEAK framework is compatible with various model forms, allowing for the algorithm to be configured for different study designs and applications, such as pathway or rare-variant analyses, by simple modifications to the model likelihood and proposal functions. |
format | Online Article Text |
id | pubmed-4015032 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40150322014-05-23 A scalable, knowledge-based analysis framework for genetic association studies Baurley, James W Conti, David V BMC Bioinformatics Methodology Article BACKGROUND: Testing for marginal associations between numerous genetic variants and disease may miss complex relationships among variables (e.g., gene-gene interactions). Bayesian approaches can model multiple variables together and offer advantages over conventional model building strategies, including using existing biological evidence as modeling priors and acknowledging that many models may fit the data well. With many candidate variables, Bayesian approaches to variable selection rely on algorithms to approximate the posterior distribution of models, such as Markov-Chain Monte Carlo (MCMC). Unfortunately, MCMC is difficult to parallelize and requires many iterations to adequately sample the posterior. We introduce a scalable algorithm called PEAK that improves the efficiency of MCMC by dividing a large set of variables into related groups using a rooted graph that resembles a mountain peak. Our algorithm takes advantage of parallel computing and existing biological databases when available. RESULTS: By using graphs to manage a model space with more than 500,000 candidate variables, we were able to improve MCMC efficiency and uncover the true simulated causal variables, including a gene-gene interaction. We applied PEAK to a case-control study of childhood asthma with 2,521 genetic variants. We used an informative graph for oxidative stress derived from Gene Ontology and identified several variants in ERBB4, OXR1, and BCL2 with strong evidence for associations with childhood asthma. CONCLUSIONS: We introduced an extremely flexible analysis framework capable of efficiently performing Bayesian variable selection on many candidate variables. The PEAK algorithm can be provided with an informative graph, which can be advantageous when considering gene-gene interactions, or a symmetric graph, which simply divides the model space into manageable regions. The PEAK framework is compatible with various model forms, allowing for the algorithm to be configured for different study designs and applications, such as pathway or rare-variant analyses, by simple modifications to the model likelihood and proposal functions. BioMed Central 2013-10-23 /pmc/articles/PMC4015032/ /pubmed/24152222 http://dx.doi.org/10.1186/1471-2105-14-312 Text en Copyright © 2013 Baurley and Conti; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Baurley, James W Conti, David V A scalable, knowledge-based analysis framework for genetic association studies |
title | A scalable, knowledge-based analysis framework for genetic association studies |
title_full | A scalable, knowledge-based analysis framework for genetic association studies |
title_fullStr | A scalable, knowledge-based analysis framework for genetic association studies |
title_full_unstemmed | A scalable, knowledge-based analysis framework for genetic association studies |
title_short | A scalable, knowledge-based analysis framework for genetic association studies |
title_sort | scalable, knowledge-based analysis framework for genetic association studies |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015032/ https://www.ncbi.nlm.nih.gov/pubmed/24152222 http://dx.doi.org/10.1186/1471-2105-14-312 |
work_keys_str_mv | AT baurleyjamesw ascalableknowledgebasedanalysisframeworkforgeneticassociationstudies AT contidavidv ascalableknowledgebasedanalysisframeworkforgeneticassociationstudies AT baurleyjamesw scalableknowledgebasedanalysisframeworkforgeneticassociationstudies AT contidavidv scalableknowledgebasedanalysisframeworkforgeneticassociationstudies |