Cargando…

Model selection based on logistic regression in a highly correlated candidate gene region

Our aim is to develop methods for identifying a (causal) variant or variants from a dense panel of single-nucleotide polymorphisms (SNPs) that are genotyped on the evidence of previous studies. Because a large number of SNPs are in close proximity to each other, the magnitude of linkage disequilibri...

Descripción completa

Detalles Bibliográficos
Autores principales: Uh, Hae-Won, Mertens, Bart JA, Jan van der Wijk, Henk, Putter, Hein, van Houwelingen, Hans C, Houwing-Duistermaat, Jeanine J
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367469/
https://www.ncbi.nlm.nih.gov/pubmed/18466455
_version_ 1782154299270234112
author Uh, Hae-Won
Mertens, Bart JA
Jan van der Wijk, Henk
Putter, Hein
van Houwelingen, Hans C
Houwing-Duistermaat, Jeanine J
author_facet Uh, Hae-Won
Mertens, Bart JA
Jan van der Wijk, Henk
Putter, Hein
van Houwelingen, Hans C
Houwing-Duistermaat, Jeanine J
author_sort Uh, Hae-Won
collection PubMed
description Our aim is to develop methods for identifying a (causal) variant or variants from a dense panel of single-nucleotide polymorphisms (SNPs) that are genotyped on the evidence of previous studies. Because a large number of SNPs are in close proximity to each other, the magnitude of linkage disequilibrium (LD) plays an important role. Namely, highly correlated SNPs may hamper standard methods such as multivariate logistic regression due to multicolinearity between the covariates. Sequences of models with high dimension naturally raise questions about model selection strategies. We investigate three variable selection methods based on logistic regression. The penalties on stepwise selection were imposed using the Akaike's Information Criterion (AIC), and using the lasso penalty. Finally, a Bayesian variable-selection logistic regression model was implemented. The methods are illustrated using the simulated dense SNPs including the causal DR/C locus on chromosome 6. We also evaluate model selection in terms of average prediction error across nine replicates. We conclude that for the Genetic Analysis Workshop 15 (GAW15) data, the newly developed Bayesian selection method performs well.
format Text
id pubmed-2367469
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23674692008-05-06 Model selection based on logistic regression in a highly correlated candidate gene region Uh, Hae-Won Mertens, Bart JA Jan van der Wijk, Henk Putter, Hein van Houwelingen, Hans C Houwing-Duistermaat, Jeanine J BMC Proc Proceedings Our aim is to develop methods for identifying a (causal) variant or variants from a dense panel of single-nucleotide polymorphisms (SNPs) that are genotyped on the evidence of previous studies. Because a large number of SNPs are in close proximity to each other, the magnitude of linkage disequilibrium (LD) plays an important role. Namely, highly correlated SNPs may hamper standard methods such as multivariate logistic regression due to multicolinearity between the covariates. Sequences of models with high dimension naturally raise questions about model selection strategies. We investigate three variable selection methods based on logistic regression. The penalties on stepwise selection were imposed using the Akaike's Information Criterion (AIC), and using the lasso penalty. Finally, a Bayesian variable-selection logistic regression model was implemented. The methods are illustrated using the simulated dense SNPs including the causal DR/C locus on chromosome 6. We also evaluate model selection in terms of average prediction error across nine replicates. We conclude that for the Genetic Analysis Workshop 15 (GAW15) data, the newly developed Bayesian selection method performs well. BioMed Central 2007-12-18 /pmc/articles/PMC2367469/ /pubmed/18466455 Text en Copyright © 2007 Uh et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Uh, Hae-Won
Mertens, Bart JA
Jan van der Wijk, Henk
Putter, Hein
van Houwelingen, Hans C
Houwing-Duistermaat, Jeanine J
Model selection based on logistic regression in a highly correlated candidate gene region
title Model selection based on logistic regression in a highly correlated candidate gene region
title_full Model selection based on logistic regression in a highly correlated candidate gene region
title_fullStr Model selection based on logistic regression in a highly correlated candidate gene region
title_full_unstemmed Model selection based on logistic regression in a highly correlated candidate gene region
title_short Model selection based on logistic regression in a highly correlated candidate gene region
title_sort model selection based on logistic regression in a highly correlated candidate gene region
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367469/
https://www.ncbi.nlm.nih.gov/pubmed/18466455
work_keys_str_mv AT uhhaewon modelselectionbasedonlogisticregressioninahighlycorrelatedcandidategeneregion
AT mertensbartja modelselectionbasedonlogisticregressioninahighlycorrelatedcandidategeneregion
AT janvanderwijkhenk modelselectionbasedonlogisticregressioninahighlycorrelatedcandidategeneregion
AT putterhein modelselectionbasedonlogisticregressioninahighlycorrelatedcandidategeneregion
AT vanhouwelingenhansc modelselectionbasedonlogisticregressioninahighlycorrelatedcandidategeneregion
AT houwingduistermaatjeaninej modelselectionbasedonlogisticregressioninahighlycorrelatedcandidategeneregion