Cargando…

Practical Issues in Screening and Variable Selection in Genome-Wide Association Analysis

Variable selection methods play an important role in high-dimensional statistical modeling and analysis. Computational cost and estimation accuracy are the two main concerns for statistical inference from ultrahigh-dimensional data. In particular, genome-wide association studies (GWAS), which focus...

Descripción completa

Detalles Bibliográficos
Autores principales: Hong, Sungyeon, Kim, Yongkang, Park, Taesung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4298256/
https://www.ncbi.nlm.nih.gov/pubmed/25635166
http://dx.doi.org/10.4137/CIN.S16350
Descripción
Sumario:Variable selection methods play an important role in high-dimensional statistical modeling and analysis. Computational cost and estimation accuracy are the two main concerns for statistical inference from ultrahigh-dimensional data. In particular, genome-wide association studies (GWAS), which focus on identifying single nucleotide polymorphisms (SNPs) associated with a disease of interest, have produced ultrahigh-dimensional data. Numerous methods have been proposed to handle GWAS data. Most statistical methods have adopted a two-stage approach: pre-screening for dimensional reduction and variable selection to identify causal SNPs. The pre-screening step selects SNPs in terms of their P-values or the absolute values of the regression coefficients in single SNP analysis. Penalized regressions, such as the ridge, lasso, adaptive lasso, and elastic-net regressions, are commonly used for the variable selection step. In this paper, we investigate which combination of pre-screening method and penalized regression performs best on a quantitative phenotype using two real GWAS datasets.