Cargando…

Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity

BACKGROUND: Consecutive testing of single nucleotide polymorphisms (SNPs) is usually employed to identify genetic variants associated with complex traits. Ideally one should model all covariates in unison, but most existing analysis methods for genome-wide association studies (GWAS) perform only uni...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chu, Benjamin B, Keys, Kevin L, German, Christopher A, Zhou, Hua, Zhou, Jin J, Sobel, Eric M, Sinsheimer, Janet S, Lange, Kenneth
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Technical Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7268817/ https://www.ncbi.nlm.nih.gov/pubmed/32491161 http://dx.doi.org/10.1093/gigascience/giaa044

_version_	1783541692289253376
author	Chu, Benjamin B Keys, Kevin L German, Christopher A Zhou, Hua Zhou, Jin J Sobel, Eric M Sinsheimer, Janet S Lange, Kenneth
author_facet	Chu, Benjamin B Keys, Kevin L German, Christopher A Zhou, Hua Zhou, Jin J Sobel, Eric M Sinsheimer, Janet S Lange, Kenneth
author_sort	Chu, Benjamin B
collection	PubMed
description	BACKGROUND: Consecutive testing of single nucleotide polymorphisms (SNPs) is usually employed to identify genetic variants associated with complex traits. Ideally one should model all covariates in unison, but most existing analysis methods for genome-wide association studies (GWAS) perform only univariate regression. RESULTS: We extend and efficiently implement iterative hard thresholding (IHT) for multiple regression, treating all SNPs simultaneously. Our extensions accommodate generalized linear models, prior information on genetic variants, and grouping of variants. In our simulations, IHT recovers up to 30% more true predictors than SNP-by-SNP association testing and exhibits a 2–3 orders of magnitude decrease in false-positive rates compared with lasso regression. We also test IHT on the UK Biobank hypertension phenotypes and the Northern Finland Birth Cohort of 1966 cardiovascular phenotypes. We find that IHT scales to the large datasets of contemporary human genetics and recovers the plausible genetic variants identified by previous studies. CONCLUSIONS: Our real data analysis and simulation studies suggest that IHT can (i) recover highly correlated predictors, (ii) avoid over-fitting, (iii) deliver better true-positive and false-positive rates than either marginal testing or lasso regression, (iv) recover unbiased regression coefficients, (v) exploit prior information and group-sparsity, and (vi) be used with biobank-sized datasets. Although these advances are studied for genome-wide association studies inference, our extensions are pertinent to other regression problems with large numbers of predictors.
format	Online Article Text
id	pubmed-7268817
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-72688172020-06-09 Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity Chu, Benjamin B Keys, Kevin L German, Christopher A Zhou, Hua Zhou, Jin J Sobel, Eric M Sinsheimer, Janet S Lange, Kenneth Gigascience Technical Note BACKGROUND: Consecutive testing of single nucleotide polymorphisms (SNPs) is usually employed to identify genetic variants associated with complex traits. Ideally one should model all covariates in unison, but most existing analysis methods for genome-wide association studies (GWAS) perform only univariate regression. RESULTS: We extend and efficiently implement iterative hard thresholding (IHT) for multiple regression, treating all SNPs simultaneously. Our extensions accommodate generalized linear models, prior information on genetic variants, and grouping of variants. In our simulations, IHT recovers up to 30% more true predictors than SNP-by-SNP association testing and exhibits a 2–3 orders of magnitude decrease in false-positive rates compared with lasso regression. We also test IHT on the UK Biobank hypertension phenotypes and the Northern Finland Birth Cohort of 1966 cardiovascular phenotypes. We find that IHT scales to the large datasets of contemporary human genetics and recovers the plausible genetic variants identified by previous studies. CONCLUSIONS: Our real data analysis and simulation studies suggest that IHT can (i) recover highly correlated predictors, (ii) avoid over-fitting, (iii) deliver better true-positive and false-positive rates than either marginal testing or lasso regression, (iv) recover unbiased regression coefficients, (v) exploit prior information and group-sparsity, and (vi) be used with biobank-sized datasets. Although these advances are studied for genome-wide association studies inference, our extensions are pertinent to other regression problems with large numbers of predictors. Oxford University Press 2020-06-03 /pmc/articles/PMC7268817/ /pubmed/32491161 http://dx.doi.org/10.1093/gigascience/giaa044 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Technical Note Chu, Benjamin B Keys, Kevin L German, Christopher A Zhou, Hua Zhou, Jin J Sobel, Eric M Sinsheimer, Janet S Lange, Kenneth Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity
title	Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity
title_full	Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity
title_fullStr	Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity
title_full_unstemmed	Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity
title_short	Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity
title_sort	iterative hard thresholding in genome-wide association studies: generalized linear models, prior weights, and double sparsity
topic	Technical Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7268817/ https://www.ncbi.nlm.nih.gov/pubmed/32491161 http://dx.doi.org/10.1093/gigascience/giaa044
work_keys_str_mv	AT chubenjaminb iterativehardthresholdingingenomewideassociationstudiesgeneralizedlinearmodelspriorweightsanddoublesparsity AT keyskevinl iterativehardthresholdingingenomewideassociationstudiesgeneralizedlinearmodelspriorweightsanddoublesparsity AT germanchristophera iterativehardthresholdingingenomewideassociationstudiesgeneralizedlinearmodelspriorweightsanddoublesparsity AT zhouhua iterativehardthresholdingingenomewideassociationstudiesgeneralizedlinearmodelspriorweightsanddoublesparsity AT zhoujinj iterativehardthresholdingingenomewideassociationstudiesgeneralizedlinearmodelspriorweightsanddoublesparsity AT sobelericm iterativehardthresholdingingenomewideassociationstudiesgeneralizedlinearmodelspriorweightsanddoublesparsity AT sinsheimerjanets iterativehardthresholdingingenomewideassociationstudiesgeneralizedlinearmodelspriorweightsanddoublesparsity AT langekenneth iterativehardthresholdingingenomewideassociationstudiesgeneralizedlinearmodelspriorweightsanddoublesparsity

Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity

Ejemplares similares