Cargando…

Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data

MOTIVATION: Sparse regularized regression methods are now widely used in genome-wide association studies (GWAS) to address the multiple testing burden that limits discovery of potentially important predictors. Linear mixed models (LMMs) have become an attractive alternative to principal components (...

Descripción completa

Detalles Bibliográficos
Autores principales:	St-Pierre, Julien, Oualkacha, Karim, Bhatnagar, Sahir Rai
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2023
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9907224/ https://www.ncbi.nlm.nih.gov/pubmed/36708013 http://dx.doi.org/10.1093/bioinformatics/btad063

_version_	1784884132550017024
author	St-Pierre, Julien Oualkacha, Karim Bhatnagar, Sahir Rai
author_facet	St-Pierre, Julien Oualkacha, Karim Bhatnagar, Sahir Rai
author_sort	St-Pierre, Julien
collection	PubMed
description	MOTIVATION: Sparse regularized regression methods are now widely used in genome-wide association studies (GWAS) to address the multiple testing burden that limits discovery of potentially important predictors. Linear mixed models (LMMs) have become an attractive alternative to principal components (PCs) adjustment to account for population structure and relatedness in high-dimensional penalized models. However, their use in binary trait GWAS rely on the invalid assumption that the residual variance does not depend on the estimated regression coefficients. Moreover, LMMs use a single spectral decomposition of the covariance matrix of the responses, which is no longer possible in generalized linear mixed models (GLMMs). RESULTS: We introduce a new method called pglmm, a penalized GLMM that allows to simultaneously select genetic markers and estimate their effects, accounting for between-individual correlations and binary nature of the trait. We develop a computationally efficient algorithm based on penalized quasi-likelihood estimation that allows to scale regularized mixed models on high-dimensional binary trait GWAS. We show through simulations that when the dimensionality of the relatedness matrix is high, penalized LMM and logistic regression with PC adjustment fail to select important predictors, and have inferior prediction accuracy compared to pglmm. Further, we demonstrate through the analysis of two polygenic binary traits in a subset of 6731 related individuals from the UK Biobank data with 320K SNPs that our method can achieve higher predictive performance, while also selecting fewer predictors than a sparse regularized logistic lasso with PC adjustment. AVAILABILITY AND IMPLEMENTATION: Our Julia package PenalizedGLMM.jl is publicly available on github: https://github.com/julstpierre/PenalizedGLMM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-9907224
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-99072242023-02-09 Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data St-Pierre, Julien Oualkacha, Karim Bhatnagar, Sahir Rai Bioinformatics Original Paper MOTIVATION: Sparse regularized regression methods are now widely used in genome-wide association studies (GWAS) to address the multiple testing burden that limits discovery of potentially important predictors. Linear mixed models (LMMs) have become an attractive alternative to principal components (PCs) adjustment to account for population structure and relatedness in high-dimensional penalized models. However, their use in binary trait GWAS rely on the invalid assumption that the residual variance does not depend on the estimated regression coefficients. Moreover, LMMs use a single spectral decomposition of the covariance matrix of the responses, which is no longer possible in generalized linear mixed models (GLMMs). RESULTS: We introduce a new method called pglmm, a penalized GLMM that allows to simultaneously select genetic markers and estimate their effects, accounting for between-individual correlations and binary nature of the trait. We develop a computationally efficient algorithm based on penalized quasi-likelihood estimation that allows to scale regularized mixed models on high-dimensional binary trait GWAS. We show through simulations that when the dimensionality of the relatedness matrix is high, penalized LMM and logistic regression with PC adjustment fail to select important predictors, and have inferior prediction accuracy compared to pglmm. Further, we demonstrate through the analysis of two polygenic binary traits in a subset of 6731 related individuals from the UK Biobank data with 320K SNPs that our method can achieve higher predictive performance, while also selecting fewer predictors than a sparse regularized logistic lasso with PC adjustment. AVAILABILITY AND IMPLEMENTATION: Our Julia package PenalizedGLMM.jl is publicly available on github: https://github.com/julstpierre/PenalizedGLMM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2023-01-27 /pmc/articles/PMC9907224/ /pubmed/36708013 http://dx.doi.org/10.1093/bioinformatics/btad063 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Paper St-Pierre, Julien Oualkacha, Karim Bhatnagar, Sahir Rai Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data
title	Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data
title_full	Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data
title_fullStr	Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data
title_full_unstemmed	Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data
title_short	Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data
title_sort	efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9907224/ https://www.ncbi.nlm.nih.gov/pubmed/36708013 http://dx.doi.org/10.1093/bioinformatics/btad063
work_keys_str_mv	AT stpierrejulien efficientpenalizedgeneralizedlinearmixedmodelsforvariableselectionandgeneticriskpredictioninhighdimensionaldata AT oualkachakarim efficientpenalizedgeneralizedlinearmixedmodelsforvariableselectionandgeneticriskpredictioninhighdimensionaldata AT bhatnagarsahirrai efficientpenalizedgeneralizedlinearmixedmodelsforvariableselectionandgeneticriskpredictioninhighdimensionaldata

Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data

Ejemplares similares