Cargando…
Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations
Polygenic risk scores (PRS) are commonly used to quantify the inherited susceptibility for a trait, yet they fail to account for non-linear and interaction effects between single nucleotide polymorphisms (SNPs). We address this via a machine learning approach, validated in nine complex phenotypes in...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9395509/ https://www.ncbi.nlm.nih.gov/pubmed/35995843 http://dx.doi.org/10.1038/s42003-022-03812-z |
_version_ | 1784771710362320896 |
---|---|
author | Elgart, Michael Lyons, Genevieve Romero-Brufau, Santiago Kurniansyah, Nuzulul Brody, Jennifer A. Guo, Xiuqing Lin, Henry J. Raffield, Laura Gao, Yan Chen, Han de Vries, Paul Lloyd-Jones, Donald M. Lange, Leslie A. Peloso, Gina M. Fornage, Myriam Rotter, Jerome I. Rich, Stephen S. Morrison, Alanna C. Psaty, Bruce M. Levy, Daniel Redline, Susan Sofer, Tamar |
author_facet | Elgart, Michael Lyons, Genevieve Romero-Brufau, Santiago Kurniansyah, Nuzulul Brody, Jennifer A. Guo, Xiuqing Lin, Henry J. Raffield, Laura Gao, Yan Chen, Han de Vries, Paul Lloyd-Jones, Donald M. Lange, Leslie A. Peloso, Gina M. Fornage, Myriam Rotter, Jerome I. Rich, Stephen S. Morrison, Alanna C. Psaty, Bruce M. Levy, Daniel Redline, Susan Sofer, Tamar |
author_sort | Elgart, Michael |
collection | PubMed |
description | Polygenic risk scores (PRS) are commonly used to quantify the inherited susceptibility for a trait, yet they fail to account for non-linear and interaction effects between single nucleotide polymorphisms (SNPs). We address this via a machine learning approach, validated in nine complex phenotypes in a multi-ancestry population. We use an ensemble method of SNP selection followed by gradient boosted trees (XGBoost) to allow for non-linearities and interaction effects. We compare our results to the standard, linear PRS model developed using PRSice, LDpred2, and lassosum2. Combining a PRS as a feature in an XGBoost model results in a relative increase in the percentage variance explained compared to the standard linear PRS model by 22% for height, 27% for HDL cholesterol, 43% for body mass index, 50% for sleep duration, 58% for systolic blood pressure, 64% for total cholesterol, 66% for triglycerides, 77% for LDL cholesterol, and 100% for diastolic blood pressure. Multi-ancestry trained models perform similarly to specific racial/ethnic group trained models and are consistently superior to the standard linear PRS models. This work demonstrates an effective method to account for non-linearities and interaction effects in genetics-based prediction models. |
format | Online Article Text |
id | pubmed-9395509 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-93955092022-08-24 Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations Elgart, Michael Lyons, Genevieve Romero-Brufau, Santiago Kurniansyah, Nuzulul Brody, Jennifer A. Guo, Xiuqing Lin, Henry J. Raffield, Laura Gao, Yan Chen, Han de Vries, Paul Lloyd-Jones, Donald M. Lange, Leslie A. Peloso, Gina M. Fornage, Myriam Rotter, Jerome I. Rich, Stephen S. Morrison, Alanna C. Psaty, Bruce M. Levy, Daniel Redline, Susan Sofer, Tamar Commun Biol Article Polygenic risk scores (PRS) are commonly used to quantify the inherited susceptibility for a trait, yet they fail to account for non-linear and interaction effects between single nucleotide polymorphisms (SNPs). We address this via a machine learning approach, validated in nine complex phenotypes in a multi-ancestry population. We use an ensemble method of SNP selection followed by gradient boosted trees (XGBoost) to allow for non-linearities and interaction effects. We compare our results to the standard, linear PRS model developed using PRSice, LDpred2, and lassosum2. Combining a PRS as a feature in an XGBoost model results in a relative increase in the percentage variance explained compared to the standard linear PRS model by 22% for height, 27% for HDL cholesterol, 43% for body mass index, 50% for sleep duration, 58% for systolic blood pressure, 64% for total cholesterol, 66% for triglycerides, 77% for LDL cholesterol, and 100% for diastolic blood pressure. Multi-ancestry trained models perform similarly to specific racial/ethnic group trained models and are consistently superior to the standard linear PRS models. This work demonstrates an effective method to account for non-linearities and interaction effects in genetics-based prediction models. Nature Publishing Group UK 2022-08-22 /pmc/articles/PMC9395509/ /pubmed/35995843 http://dx.doi.org/10.1038/s42003-022-03812-z Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Elgart, Michael Lyons, Genevieve Romero-Brufau, Santiago Kurniansyah, Nuzulul Brody, Jennifer A. Guo, Xiuqing Lin, Henry J. Raffield, Laura Gao, Yan Chen, Han de Vries, Paul Lloyd-Jones, Donald M. Lange, Leslie A. Peloso, Gina M. Fornage, Myriam Rotter, Jerome I. Rich, Stephen S. Morrison, Alanna C. Psaty, Bruce M. Levy, Daniel Redline, Susan Sofer, Tamar Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations |
title | Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations |
title_full | Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations |
title_fullStr | Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations |
title_full_unstemmed | Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations |
title_short | Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations |
title_sort | non-linear machine learning models incorporating snps and prs improve polygenic prediction in diverse human populations |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9395509/ https://www.ncbi.nlm.nih.gov/pubmed/35995843 http://dx.doi.org/10.1038/s42003-022-03812-z |
work_keys_str_mv | AT elgartmichael nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT lyonsgenevieve nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT romerobrufausantiago nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT kurniansyahnuzulul nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT brodyjennifera nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT guoxiuqing nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT linhenryj nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT raffieldlaura nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT gaoyan nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT chenhan nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT devriespaul nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT lloydjonesdonaldm nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT langelesliea nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT pelosoginam nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT fornagemyriam nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT rotterjeromei nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT richstephens nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT morrisonalannac nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT psatybrucem nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT levydaniel nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT redlinesusan nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations AT sofertamar nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations |