Cargando…

Penalized regression and model selection methods for polygenic scores on summary statistics

Polygenic scores quantify the genetic risk associated with a given phenotype and are widely used to predict the risk of complex diseases. There has been recent interest in developing methods to construct polygenic risk scores using summary statistic data. We propose a method to construct polygenic r...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pattee, Jack, Pan, Wei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7553329/ https://www.ncbi.nlm.nih.gov/pubmed/33001975 http://dx.doi.org/10.1371/journal.pcbi.1008271

_version_	1783593579120164864
author	Pattee, Jack Pan, Wei
author_facet	Pattee, Jack Pan, Wei
author_sort	Pattee, Jack
collection	PubMed
description	Polygenic scores quantify the genetic risk associated with a given phenotype and are widely used to predict the risk of complex diseases. There has been recent interest in developing methods to construct polygenic risk scores using summary statistic data. We propose a method to construct polygenic risk scores via penalized regression using summary statistic data and publicly available reference data. Our method bears similarity to existing method LassoSum, extending their framework to the Truncated Lasso Penalty (TLP) and the elastic net. We show via simulation and real data application that the TLP improves predictive accuracy as compared to the LASSO while imposing additional sparsity where appropriate. To facilitate model selection in the absence of validation data, we propose methods for estimating model fitting criteria AIC and BIC. These methods approximate the AIC and BIC in the case where we have a polygenic risk score estimated on summary statistic data and no validation data. Additionally, we propose the so-called quasi-correlation metric, which quantifies the predictive accuracy of a polygenic risk score applied to out-of-sample data for which we have only summary statistic information. In total, these methods facilitate estimation and model selection of polygenic risk scores on summary statistic data, and the application of these polygenic risk scores to out-of-sample data for which we have only summary statistic information. We demonstrate the utility of these methods by applying them to GWA studies of lipids, height, and lung cancer.
format	Online Article Text
id	pubmed-7553329
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-75533292020-10-21 Penalized regression and model selection methods for polygenic scores on summary statistics Pattee, Jack Pan, Wei PLoS Comput Biol Research Article Polygenic scores quantify the genetic risk associated with a given phenotype and are widely used to predict the risk of complex diseases. There has been recent interest in developing methods to construct polygenic risk scores using summary statistic data. We propose a method to construct polygenic risk scores via penalized regression using summary statistic data and publicly available reference data. Our method bears similarity to existing method LassoSum, extending their framework to the Truncated Lasso Penalty (TLP) and the elastic net. We show via simulation and real data application that the TLP improves predictive accuracy as compared to the LASSO while imposing additional sparsity where appropriate. To facilitate model selection in the absence of validation data, we propose methods for estimating model fitting criteria AIC and BIC. These methods approximate the AIC and BIC in the case where we have a polygenic risk score estimated on summary statistic data and no validation data. Additionally, we propose the so-called quasi-correlation metric, which quantifies the predictive accuracy of a polygenic risk score applied to out-of-sample data for which we have only summary statistic information. In total, these methods facilitate estimation and model selection of polygenic risk scores on summary statistic data, and the application of these polygenic risk scores to out-of-sample data for which we have only summary statistic information. We demonstrate the utility of these methods by applying them to GWA studies of lipids, height, and lung cancer. Public Library of Science 2020-10-01 /pmc/articles/PMC7553329/ /pubmed/33001975 http://dx.doi.org/10.1371/journal.pcbi.1008271 Text en © 2020 Pattee, Pan http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Pattee, Jack Pan, Wei Penalized regression and model selection methods for polygenic scores on summary statistics
title	Penalized regression and model selection methods for polygenic scores on summary statistics
title_full	Penalized regression and model selection methods for polygenic scores on summary statistics
title_fullStr	Penalized regression and model selection methods for polygenic scores on summary statistics
title_full_unstemmed	Penalized regression and model selection methods for polygenic scores on summary statistics
title_short	Penalized regression and model selection methods for polygenic scores on summary statistics
title_sort	penalized regression and model selection methods for polygenic scores on summary statistics
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7553329/ https://www.ncbi.nlm.nih.gov/pubmed/33001975 http://dx.doi.org/10.1371/journal.pcbi.1008271
work_keys_str_mv	AT patteejack penalizedregressionandmodelselectionmethodsforpolygenicscoresonsummarystatistics AT panwei penalizedregressionandmodelselectionmethodsforpolygenicscoresonsummarystatistics

Penalized regression and model selection methods for polygenic scores on summary statistics

Ejemplares similares