Cargando…

Logistic Bayesian LASSO for Genetic Association Analysis of Data from Complex Sampling Designs

Detecting gene-environment interactions (GXE) with rare variants is critical in dissecting the etiology of common diseases. Interactions with rare haplotype variants (rHTV) are of particular interest. At the same time, complex sampling designs, such as stratified random sampling, are becoming increa...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Yuan, Hofmann, Jonathan N., Purdue, Mark P., Lin, Shili, Biswas, Swati
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5572548/
https://www.ncbi.nlm.nih.gov/pubmed/28424482
http://dx.doi.org/10.1038/jhg.2017.43
Descripción
Sumario:Detecting gene-environment interactions (GXE) with rare variants is critical in dissecting the etiology of common diseases. Interactions with rare haplotype variants (rHTV) are of particular interest. At the same time, complex sampling designs, such as stratified random sampling, are becoming increasingly popular for designing case-control studies, especially for recruiting controls. The US Kidney Cancer Study (KCS) is an example, wherein all available cases were included while the controls at each site were randomly selected from the population by frequency matching with cases based on age, sex, and race. There is currently no rHTV association method that can account for such a complex sampling design. To fill this gap, we consider logistic Bayesian LASSO (LBL), an existing rHTV approach for case-control data, and show that its model can easily accommodate the complex sampling design. We study two extensions that include stratifying variables either as main effects only or with additional modeling of their interactions with haplotypes. We conduct extensive simulation studies to compare the complex sampling methods with the original LBL methods. We find that when there is no interaction between haplotype and stratifying variables, both extensions perform well while the original LBL methods lead to inflated type I error rates. However, when such an interaction exists, it is necessary to include the interaction effect in the model to control the type I error. Finally, we analyze the KCS data and find a significant interaction between (current) smoking and a specific rHTV in the N-acetyltransferase 2 (NAT2) gene.