Cargando…

Reprioritizing Genetic Associations in Hit Regions Using LASSO-Based Resample Model Averaging

Significance testing one SNP at a time has proven useful for identifying genomic regions that harbor variants affecting human disease. But after an initial genome scan has identified a “hit region” of association, single-locus approaches can falter. Local linkage disequilibrium (LD) can make both th...

Descripción completa

Detalles Bibliográficos
Autores principales: Valdar, William, Sabourin, Jeremy, Nobel, Andrew, Holmes, Christopher C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Blackwell Publishing Ltd 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3470705/
https://www.ncbi.nlm.nih.gov/pubmed/22549815
http://dx.doi.org/10.1002/gepi.21639
_version_ 1782246311177748480
author Valdar, William
Sabourin, Jeremy
Nobel, Andrew
Holmes, Christopher C
author_facet Valdar, William
Sabourin, Jeremy
Nobel, Andrew
Holmes, Christopher C
author_sort Valdar, William
collection PubMed
description Significance testing one SNP at a time has proven useful for identifying genomic regions that harbor variants affecting human disease. But after an initial genome scan has identified a “hit region” of association, single-locus approaches can falter. Local linkage disequilibrium (LD) can make both the number of underlying true signals and their identities ambiguous. Simultaneous modeling of multiple loci should help. However, it is typically applied ad hoc: conditioning on the top SNPs, with limited exploration of the model space and no assessment of how sensitive model choice was to sampling variability. Formal alternatives exist but are seldom used. Bayesian variable selection is coherent but requires specifying a full joint model, including priors on parameters and the model space. Penalized regression methods (e.g., LASSO) appear promising but require calibration, and, once calibrated, lead to a choice of SNPs that can be misleadingly decisive. We present a general method for characterizing uncertainty in model choice that is tailored to reprioritizing SNPs within a hit region under strong LD. Our method, LASSO local automatic regularization resample model averaging (LLARRMA), combines LASSO shrinkage with resample model averaging and multiple imputation, estimating for each SNP the probability that it would be included in a multi-SNP model in alternative realizations of the data. We apply LLARRMA to simulations based on case-control genome-wide association studies data, and find that when there are several causal loci and strong LD, LLARRMA identifies a set of candidates that is enriched for true signals relative to single locus analysis and to the recently proposed method of Stability Selection. Genet. Epidemiol. 36:451–462, 2012. © 2012 Wiley Periodicals, Inc.
format Online
Article
Text
id pubmed-3470705
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Blackwell Publishing Ltd
record_format MEDLINE/PubMed
spelling pubmed-34707052012-10-18 Reprioritizing Genetic Associations in Hit Regions Using LASSO-Based Resample Model Averaging Valdar, William Sabourin, Jeremy Nobel, Andrew Holmes, Christopher C Genet Epidemiol Research Articles Significance testing one SNP at a time has proven useful for identifying genomic regions that harbor variants affecting human disease. But after an initial genome scan has identified a “hit region” of association, single-locus approaches can falter. Local linkage disequilibrium (LD) can make both the number of underlying true signals and their identities ambiguous. Simultaneous modeling of multiple loci should help. However, it is typically applied ad hoc: conditioning on the top SNPs, with limited exploration of the model space and no assessment of how sensitive model choice was to sampling variability. Formal alternatives exist but are seldom used. Bayesian variable selection is coherent but requires specifying a full joint model, including priors on parameters and the model space. Penalized regression methods (e.g., LASSO) appear promising but require calibration, and, once calibrated, lead to a choice of SNPs that can be misleadingly decisive. We present a general method for characterizing uncertainty in model choice that is tailored to reprioritizing SNPs within a hit region under strong LD. Our method, LASSO local automatic regularization resample model averaging (LLARRMA), combines LASSO shrinkage with resample model averaging and multiple imputation, estimating for each SNP the probability that it would be included in a multi-SNP model in alternative realizations of the data. We apply LLARRMA to simulations based on case-control genome-wide association studies data, and find that when there are several causal loci and strong LD, LLARRMA identifies a set of candidates that is enriched for true signals relative to single locus analysis and to the recently proposed method of Stability Selection. Genet. Epidemiol. 36:451–462, 2012. © 2012 Wiley Periodicals, Inc. Blackwell Publishing Ltd 2012-07 2012-04-30 /pmc/articles/PMC3470705/ /pubmed/22549815 http://dx.doi.org/10.1002/gepi.21639 Text en © 2012 Wiley Periodicals, Inc. http://creativecommons.org/licenses/by/2.5/ Re-use of this article is permitted in accordance with the Creative Commons Deed, Attribution 2.5, which does not permit commercial exploitation.
spellingShingle Research Articles
Valdar, William
Sabourin, Jeremy
Nobel, Andrew
Holmes, Christopher C
Reprioritizing Genetic Associations in Hit Regions Using LASSO-Based Resample Model Averaging
title Reprioritizing Genetic Associations in Hit Regions Using LASSO-Based Resample Model Averaging
title_full Reprioritizing Genetic Associations in Hit Regions Using LASSO-Based Resample Model Averaging
title_fullStr Reprioritizing Genetic Associations in Hit Regions Using LASSO-Based Resample Model Averaging
title_full_unstemmed Reprioritizing Genetic Associations in Hit Regions Using LASSO-Based Resample Model Averaging
title_short Reprioritizing Genetic Associations in Hit Regions Using LASSO-Based Resample Model Averaging
title_sort reprioritizing genetic associations in hit regions using lasso-based resample model averaging
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3470705/
https://www.ncbi.nlm.nih.gov/pubmed/22549815
http://dx.doi.org/10.1002/gepi.21639
work_keys_str_mv AT valdarwilliam reprioritizinggeneticassociationsinhitregionsusinglassobasedresamplemodelaveraging
AT sabourinjeremy reprioritizinggeneticassociationsinhitregionsusinglassobasedresamplemodelaveraging
AT nobelandrew reprioritizinggeneticassociationsinhitregionsusinglassobasedresamplemodelaveraging
AT holmeschristopherc reprioritizinggeneticassociationsinhitregionsusinglassobasedresamplemodelaveraging