Cargando…

Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations

Polygenic risk scores (PRS) are commonly used to quantify the inherited susceptibility for a trait, yet they fail to account for non-linear and interaction effects between single nucleotide polymorphisms (SNPs). We address this via a machine learning approach, validated in nine complex phenotypes in...

Descripción completa

Detalles Bibliográficos
Autores principales: Elgart, Michael, Lyons, Genevieve, Romero-Brufau, Santiago, Kurniansyah, Nuzulul, Brody, Jennifer A., Guo, Xiuqing, Lin, Henry J., Raffield, Laura, Gao, Yan, Chen, Han, de Vries, Paul, Lloyd-Jones, Donald M., Lange, Leslie A., Peloso, Gina M., Fornage, Myriam, Rotter, Jerome I., Rich, Stephen S., Morrison, Alanna C., Psaty, Bruce M., Levy, Daniel, Redline, Susan, Sofer, Tamar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9395509/
https://www.ncbi.nlm.nih.gov/pubmed/35995843
http://dx.doi.org/10.1038/s42003-022-03812-z
_version_ 1784771710362320896
author Elgart, Michael
Lyons, Genevieve
Romero-Brufau, Santiago
Kurniansyah, Nuzulul
Brody, Jennifer A.
Guo, Xiuqing
Lin, Henry J.
Raffield, Laura
Gao, Yan
Chen, Han
de Vries, Paul
Lloyd-Jones, Donald M.
Lange, Leslie A.
Peloso, Gina M.
Fornage, Myriam
Rotter, Jerome I.
Rich, Stephen S.
Morrison, Alanna C.
Psaty, Bruce M.
Levy, Daniel
Redline, Susan
Sofer, Tamar
author_facet Elgart, Michael
Lyons, Genevieve
Romero-Brufau, Santiago
Kurniansyah, Nuzulul
Brody, Jennifer A.
Guo, Xiuqing
Lin, Henry J.
Raffield, Laura
Gao, Yan
Chen, Han
de Vries, Paul
Lloyd-Jones, Donald M.
Lange, Leslie A.
Peloso, Gina M.
Fornage, Myriam
Rotter, Jerome I.
Rich, Stephen S.
Morrison, Alanna C.
Psaty, Bruce M.
Levy, Daniel
Redline, Susan
Sofer, Tamar
author_sort Elgart, Michael
collection PubMed
description Polygenic risk scores (PRS) are commonly used to quantify the inherited susceptibility for a trait, yet they fail to account for non-linear and interaction effects between single nucleotide polymorphisms (SNPs). We address this via a machine learning approach, validated in nine complex phenotypes in a multi-ancestry population. We use an ensemble method of SNP selection followed by gradient boosted trees (XGBoost) to allow for non-linearities and interaction effects. We compare our results to the standard, linear PRS model developed using PRSice, LDpred2, and lassosum2. Combining a PRS as a feature in an XGBoost model results in a relative increase in the percentage variance explained compared to the standard linear PRS model by 22% for height, 27% for HDL cholesterol, 43% for body mass index, 50% for sleep duration, 58% for systolic blood pressure, 64% for total cholesterol, 66% for triglycerides, 77% for LDL cholesterol, and 100% for diastolic blood pressure. Multi-ancestry trained models perform similarly to specific racial/ethnic group trained models and are consistently superior to the standard linear PRS models. This work demonstrates an effective method to account for non-linearities and interaction effects in genetics-based prediction models.
format Online
Article
Text
id pubmed-9395509
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-93955092022-08-24 Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations Elgart, Michael Lyons, Genevieve Romero-Brufau, Santiago Kurniansyah, Nuzulul Brody, Jennifer A. Guo, Xiuqing Lin, Henry J. Raffield, Laura Gao, Yan Chen, Han de Vries, Paul Lloyd-Jones, Donald M. Lange, Leslie A. Peloso, Gina M. Fornage, Myriam Rotter, Jerome I. Rich, Stephen S. Morrison, Alanna C. Psaty, Bruce M. Levy, Daniel Redline, Susan Sofer, Tamar Commun Biol Article Polygenic risk scores (PRS) are commonly used to quantify the inherited susceptibility for a trait, yet they fail to account for non-linear and interaction effects between single nucleotide polymorphisms (SNPs). We address this via a machine learning approach, validated in nine complex phenotypes in a multi-ancestry population. We use an ensemble method of SNP selection followed by gradient boosted trees (XGBoost) to allow for non-linearities and interaction effects. We compare our results to the standard, linear PRS model developed using PRSice, LDpred2, and lassosum2. Combining a PRS as a feature in an XGBoost model results in a relative increase in the percentage variance explained compared to the standard linear PRS model by 22% for height, 27% for HDL cholesterol, 43% for body mass index, 50% for sleep duration, 58% for systolic blood pressure, 64% for total cholesterol, 66% for triglycerides, 77% for LDL cholesterol, and 100% for diastolic blood pressure. Multi-ancestry trained models perform similarly to specific racial/ethnic group trained models and are consistently superior to the standard linear PRS models. This work demonstrates an effective method to account for non-linearities and interaction effects in genetics-based prediction models. Nature Publishing Group UK 2022-08-22 /pmc/articles/PMC9395509/ /pubmed/35995843 http://dx.doi.org/10.1038/s42003-022-03812-z Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Elgart, Michael
Lyons, Genevieve
Romero-Brufau, Santiago
Kurniansyah, Nuzulul
Brody, Jennifer A.
Guo, Xiuqing
Lin, Henry J.
Raffield, Laura
Gao, Yan
Chen, Han
de Vries, Paul
Lloyd-Jones, Donald M.
Lange, Leslie A.
Peloso, Gina M.
Fornage, Myriam
Rotter, Jerome I.
Rich, Stephen S.
Morrison, Alanna C.
Psaty, Bruce M.
Levy, Daniel
Redline, Susan
Sofer, Tamar
Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations
title Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations
title_full Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations
title_fullStr Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations
title_full_unstemmed Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations
title_short Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations
title_sort non-linear machine learning models incorporating snps and prs improve polygenic prediction in diverse human populations
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9395509/
https://www.ncbi.nlm.nih.gov/pubmed/35995843
http://dx.doi.org/10.1038/s42003-022-03812-z
work_keys_str_mv AT elgartmichael nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT lyonsgenevieve nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT romerobrufausantiago nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT kurniansyahnuzulul nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT brodyjennifera nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT guoxiuqing nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT linhenryj nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT raffieldlaura nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT gaoyan nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT chenhan nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT devriespaul nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT lloydjonesdonaldm nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT langelesliea nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT pelosoginam nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT fornagemyriam nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT rotterjeromei nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT richstephens nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT morrisonalannac nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT psatybrucem nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT levydaniel nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT redlinesusan nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations
AT sofertamar nonlinearmachinelearningmodelsincorporatingsnpsandprsimprovepolygenicpredictionindiversehumanpopulations