Cargando…

SNP Selection in Genome-Wide and Candidate Gene Studies via Penalized Logistic Regression

Penalized regression methods offer an attractive alternative to single marker testing in genetic association analysis. Penalized regression methods shrink down to zero the coefficient of markers that have little apparent effect on the trait of interest, resulting in a parsimonious subset of what we...

Descripción completa

Detalles Bibliográficos
Autores principales: Ayers, Kristin L, Cordell, Heather J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Wiley Subscription Services, Inc., A Wiley Company 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3410531/
https://www.ncbi.nlm.nih.gov/pubmed/21104890
http://dx.doi.org/10.1002/gepi.20543
_version_ 1782239739570552832
author Ayers, Kristin L
Cordell, Heather J
author_facet Ayers, Kristin L
Cordell, Heather J
author_sort Ayers, Kristin L
collection PubMed
description Penalized regression methods offer an attractive alternative to single marker testing in genetic association analysis. Penalized regression methods shrink down to zero the coefficient of markers that have little apparent effect on the trait of interest, resulting in a parsimonious subset of what we hope are true pertinent predictors. Here we explore the performance of penalization in selecting SNPs as predictors in genetic association studies. The strength of the penalty can be chosen either to select a good predictive model (via methods such as computationally expensive cross validation), through maximum likelihood-based model selection criterion (such as the BIC), or to select a model that controls for type I error, as done here. We have investigated the performance of several penalized logistic regression approaches, simulating data under a variety of disease locus effect size and linkage disequilibrium patterns. We compared several penalties, including the elastic net, ridge, Lasso, MCP and the normal-exponential-γ shrinkage prior implemented in the hyperlasso software, to standard single locus analysis and simple forward stepwise regression. We examined how markers enter the model as penalties and P-value thresholds are varied, and report the sensitivity and specificity of each of the methods. Results show that penalized methods outperform single marker analysis, with the main difference being that penalized methods allow the simultaneous inclusion of a number of markers, and generally do not allow correlated variables to enter the model, producing a sparse model in which most of the identified explanatory markers are accounted for. Genet. Epidemiol. 34:879–891, 2010. © 2010 Wiley-Liss, Inc.
format Online
Article
Text
id pubmed-3410531
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Wiley Subscription Services, Inc., A Wiley Company
record_format MEDLINE/PubMed
spelling pubmed-34105312012-08-02 SNP Selection in Genome-Wide and Candidate Gene Studies via Penalized Logistic Regression Ayers, Kristin L Cordell, Heather J Genet Epidemiol Original Articles Penalized regression methods offer an attractive alternative to single marker testing in genetic association analysis. Penalized regression methods shrink down to zero the coefficient of markers that have little apparent effect on the trait of interest, resulting in a parsimonious subset of what we hope are true pertinent predictors. Here we explore the performance of penalization in selecting SNPs as predictors in genetic association studies. The strength of the penalty can be chosen either to select a good predictive model (via methods such as computationally expensive cross validation), through maximum likelihood-based model selection criterion (such as the BIC), or to select a model that controls for type I error, as done here. We have investigated the performance of several penalized logistic regression approaches, simulating data under a variety of disease locus effect size and linkage disequilibrium patterns. We compared several penalties, including the elastic net, ridge, Lasso, MCP and the normal-exponential-γ shrinkage prior implemented in the hyperlasso software, to standard single locus analysis and simple forward stepwise regression. We examined how markers enter the model as penalties and P-value thresholds are varied, and report the sensitivity and specificity of each of the methods. Results show that penalized methods outperform single marker analysis, with the main difference being that penalized methods allow the simultaneous inclusion of a number of markers, and generally do not allow correlated variables to enter the model, producing a sparse model in which most of the identified explanatory markers are accounted for. Genet. Epidemiol. 34:879–891, 2010. © 2010 Wiley-Liss, Inc. Wiley Subscription Services, Inc., A Wiley Company 2010-12 2010-11-18 /pmc/articles/PMC3410531/ /pubmed/21104890 http://dx.doi.org/10.1002/gepi.20543 Text en © 2010 Wiley-Liss, Inc. http://creativecommons.org/licenses/by/2.5/ Re-use of this article is permitted in accordance with the Creative Commons Deed, Attribution 2.5, which does not permit commercial exploitation.
spellingShingle Original Articles
Ayers, Kristin L
Cordell, Heather J
SNP Selection in Genome-Wide and Candidate Gene Studies via Penalized Logistic Regression
title SNP Selection in Genome-Wide and Candidate Gene Studies via Penalized Logistic Regression
title_full SNP Selection in Genome-Wide and Candidate Gene Studies via Penalized Logistic Regression
title_fullStr SNP Selection in Genome-Wide and Candidate Gene Studies via Penalized Logistic Regression
title_full_unstemmed SNP Selection in Genome-Wide and Candidate Gene Studies via Penalized Logistic Regression
title_short SNP Selection in Genome-Wide and Candidate Gene Studies via Penalized Logistic Regression
title_sort snp selection in genome-wide and candidate gene studies via penalized logistic regression
topic Original Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3410531/
https://www.ncbi.nlm.nih.gov/pubmed/21104890
http://dx.doi.org/10.1002/gepi.20543
work_keys_str_mv AT ayerskristinl snpselectioningenomewideandcandidategenestudiesviapenalizedlogisticregression
AT cordellheatherj snpselectioningenomewideandcandidategenestudiesviapenalizedlogisticregression