Cargando…

A scalable, knowledge-based analysis framework for genetic association studies

BACKGROUND: Testing for marginal associations between numerous genetic variants and disease may miss complex relationships among variables (e.g., gene-gene interactions). Bayesian approaches can model multiple variables together and offer advantages over conventional model building strategies, inclu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Baurley, James W, Conti, David V
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015032/ https://www.ncbi.nlm.nih.gov/pubmed/24152222 http://dx.doi.org/10.1186/1471-2105-14-312

_version_	1782315275593449472
author	Baurley, James W Conti, David V
author_facet	Baurley, James W Conti, David V
author_sort	Baurley, James W
collection	PubMed
description	BACKGROUND: Testing for marginal associations between numerous genetic variants and disease may miss complex relationships among variables (e.g., gene-gene interactions). Bayesian approaches can model multiple variables together and offer advantages over conventional model building strategies, including using existing biological evidence as modeling priors and acknowledging that many models may fit the data well. With many candidate variables, Bayesian approaches to variable selection rely on algorithms to approximate the posterior distribution of models, such as Markov-Chain Monte Carlo (MCMC). Unfortunately, MCMC is difficult to parallelize and requires many iterations to adequately sample the posterior. We introduce a scalable algorithm called PEAK that improves the efficiency of MCMC by dividing a large set of variables into related groups using a rooted graph that resembles a mountain peak. Our algorithm takes advantage of parallel computing and existing biological databases when available. RESULTS: By using graphs to manage a model space with more than 500,000 candidate variables, we were able to improve MCMC efficiency and uncover the true simulated causal variables, including a gene-gene interaction. We applied PEAK to a case-control study of childhood asthma with 2,521 genetic variants. We used an informative graph for oxidative stress derived from Gene Ontology and identified several variants in ERBB4, OXR1, and BCL2 with strong evidence for associations with childhood asthma. CONCLUSIONS: We introduced an extremely flexible analysis framework capable of efficiently performing Bayesian variable selection on many candidate variables. The PEAK algorithm can be provided with an informative graph, which can be advantageous when considering gene-gene interactions, or a symmetric graph, which simply divides the model space into manageable regions. The PEAK framework is compatible with various model forms, allowing for the algorithm to be configured for different study designs and applications, such as pathway or rare-variant analyses, by simple modifications to the model likelihood and proposal functions.
format	Online Article Text
id	pubmed-4015032
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40150322014-05-23 A scalable, knowledge-based analysis framework for genetic association studies Baurley, James W Conti, David V BMC Bioinformatics Methodology Article BACKGROUND: Testing for marginal associations between numerous genetic variants and disease may miss complex relationships among variables (e.g., gene-gene interactions). Bayesian approaches can model multiple variables together and offer advantages over conventional model building strategies, including using existing biological evidence as modeling priors and acknowledging that many models may fit the data well. With many candidate variables, Bayesian approaches to variable selection rely on algorithms to approximate the posterior distribution of models, such as Markov-Chain Monte Carlo (MCMC). Unfortunately, MCMC is difficult to parallelize and requires many iterations to adequately sample the posterior. We introduce a scalable algorithm called PEAK that improves the efficiency of MCMC by dividing a large set of variables into related groups using a rooted graph that resembles a mountain peak. Our algorithm takes advantage of parallel computing and existing biological databases when available. RESULTS: By using graphs to manage a model space with more than 500,000 candidate variables, we were able to improve MCMC efficiency and uncover the true simulated causal variables, including a gene-gene interaction. We applied PEAK to a case-control study of childhood asthma with 2,521 genetic variants. We used an informative graph for oxidative stress derived from Gene Ontology and identified several variants in ERBB4, OXR1, and BCL2 with strong evidence for associations with childhood asthma. CONCLUSIONS: We introduced an extremely flexible analysis framework capable of efficiently performing Bayesian variable selection on many candidate variables. The PEAK algorithm can be provided with an informative graph, which can be advantageous when considering gene-gene interactions, or a symmetric graph, which simply divides the model space into manageable regions. The PEAK framework is compatible with various model forms, allowing for the algorithm to be configured for different study designs and applications, such as pathway or rare-variant analyses, by simple modifications to the model likelihood and proposal functions. BioMed Central 2013-10-23 /pmc/articles/PMC4015032/ /pubmed/24152222 http://dx.doi.org/10.1186/1471-2105-14-312 Text en Copyright © 2013 Baurley and Conti; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Baurley, James W Conti, David V A scalable, knowledge-based analysis framework for genetic association studies
title	A scalable, knowledge-based analysis framework for genetic association studies
title_full	A scalable, knowledge-based analysis framework for genetic association studies
title_fullStr	A scalable, knowledge-based analysis framework for genetic association studies
title_full_unstemmed	A scalable, knowledge-based analysis framework for genetic association studies
title_short	A scalable, knowledge-based analysis framework for genetic association studies
title_sort	scalable, knowledge-based analysis framework for genetic association studies
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015032/ https://www.ncbi.nlm.nih.gov/pubmed/24152222 http://dx.doi.org/10.1186/1471-2105-14-312
work_keys_str_mv	AT baurleyjamesw ascalableknowledgebasedanalysisframeworkforgeneticassociationstudies AT contidavidv ascalableknowledgebasedanalysisframeworkforgeneticassociationstudies AT baurleyjamesw scalableknowledgebasedanalysisframeworkforgeneticassociationstudies AT contidavidv scalableknowledgebasedanalysisframeworkforgeneticassociationstudies

A scalable, knowledge-based analysis framework for genetic association studies

Ejemplares similares