Cargando…

Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models

Complex traits are known to be influenced by a combination of environmental factors and rare and common genetic variants. However, detection of such multivariate associations can be compromised by low statistical power and confounding by population structure. Linear mixed effects models (LMM) can ac...

Descripción completa

Detalles Bibliográficos
Autores principales: Bhatnagar, Sahir R., Yang, Yi, Lu, Tianyuan, Schurr, Erwin, Loredo-Osti, JC, Forest, Marie, Oualkacha, Karim, Greenwood, Celia M. T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7224575/
https://www.ncbi.nlm.nih.gov/pubmed/32365090
http://dx.doi.org/10.1371/journal.pgen.1008766
_version_ 1783533929758720000
author Bhatnagar, Sahir R.
Yang, Yi
Lu, Tianyuan
Schurr, Erwin
Loredo-Osti, JC
Forest, Marie
Oualkacha, Karim
Greenwood, Celia M. T.
author_facet Bhatnagar, Sahir R.
Yang, Yi
Lu, Tianyuan
Schurr, Erwin
Loredo-Osti, JC
Forest, Marie
Oualkacha, Karim
Greenwood, Celia M. T.
author_sort Bhatnagar, Sahir R.
collection PubMed
description Complex traits are known to be influenced by a combination of environmental factors and rare and common genetic variants. However, detection of such multivariate associations can be compromised by low statistical power and confounding by population structure. Linear mixed effects models (LMM) can account for correlations due to relatedness but have not been applicable in high-dimensional (HD) settings where the number of fixed effect predictors greatly exceeds the number of samples. False positives or false negatives can result from two-stage approaches, where the residuals estimated from a null model adjusted for the subjects’ relationship structure are subsequently used as the response in a standard penalized regression model. To overcome these challenges, we develop a general penalized LMM with a single random effect called ggmix for simultaneous SNP selection and adjustment for population structure in high dimensional prediction models. We develop a blockwise coordinate descent algorithm with automatic tuning parameter selection which is highly scalable, computationally efficient and has theoretical guarantees of convergence. Through simulations and three real data examples, we show that ggmix leads to more parsimonious models compared to the two-stage approach or principal component adjustment with better prediction accuracy. Our method performs well even in the presence of highly correlated markers, and when the causal SNPs are included in the kinship matrix. ggmix can be used to construct polygenic risk scores and select instrumental variables in Mendelian randomization studies. Our algorithms are available in an R package available on CRAN (https://cran.r-project.org/package=ggmix).
format Online
Article
Text
id pubmed-7224575
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-72245752020-06-01 Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models Bhatnagar, Sahir R. Yang, Yi Lu, Tianyuan Schurr, Erwin Loredo-Osti, JC Forest, Marie Oualkacha, Karim Greenwood, Celia M. T. PLoS Genet Research Article Complex traits are known to be influenced by a combination of environmental factors and rare and common genetic variants. However, detection of such multivariate associations can be compromised by low statistical power and confounding by population structure. Linear mixed effects models (LMM) can account for correlations due to relatedness but have not been applicable in high-dimensional (HD) settings where the number of fixed effect predictors greatly exceeds the number of samples. False positives or false negatives can result from two-stage approaches, where the residuals estimated from a null model adjusted for the subjects’ relationship structure are subsequently used as the response in a standard penalized regression model. To overcome these challenges, we develop a general penalized LMM with a single random effect called ggmix for simultaneous SNP selection and adjustment for population structure in high dimensional prediction models. We develop a blockwise coordinate descent algorithm with automatic tuning parameter selection which is highly scalable, computationally efficient and has theoretical guarantees of convergence. Through simulations and three real data examples, we show that ggmix leads to more parsimonious models compared to the two-stage approach or principal component adjustment with better prediction accuracy. Our method performs well even in the presence of highly correlated markers, and when the causal SNPs are included in the kinship matrix. ggmix can be used to construct polygenic risk scores and select instrumental variables in Mendelian randomization studies. Our algorithms are available in an R package available on CRAN (https://cran.r-project.org/package=ggmix). Public Library of Science 2020-05-04 /pmc/articles/PMC7224575/ /pubmed/32365090 http://dx.doi.org/10.1371/journal.pgen.1008766 Text en © 2020 Bhatnagar et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Bhatnagar, Sahir R.
Yang, Yi
Lu, Tianyuan
Schurr, Erwin
Loredo-Osti, JC
Forest, Marie
Oualkacha, Karim
Greenwood, Celia M. T.
Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models
title Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models
title_full Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models
title_fullStr Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models
title_full_unstemmed Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models
title_short Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models
title_sort simultaneous snp selection and adjustment for population structure in high dimensional prediction models
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7224575/
https://www.ncbi.nlm.nih.gov/pubmed/32365090
http://dx.doi.org/10.1371/journal.pgen.1008766
work_keys_str_mv AT bhatnagarsahirr simultaneoussnpselectionandadjustmentforpopulationstructureinhighdimensionalpredictionmodels
AT yangyi simultaneoussnpselectionandadjustmentforpopulationstructureinhighdimensionalpredictionmodels
AT lutianyuan simultaneoussnpselectionandadjustmentforpopulationstructureinhighdimensionalpredictionmodels
AT schurrerwin simultaneoussnpselectionandadjustmentforpopulationstructureinhighdimensionalpredictionmodels
AT loredoostijc simultaneoussnpselectionandadjustmentforpopulationstructureinhighdimensionalpredictionmodels
AT forestmarie simultaneoussnpselectionandadjustmentforpopulationstructureinhighdimensionalpredictionmodels
AT oualkachakarim simultaneoussnpselectionandadjustmentforpopulationstructureinhighdimensionalpredictionmodels
AT greenwoodceliamt simultaneoussnpselectionandadjustmentforpopulationstructureinhighdimensionalpredictionmodels