Cargando…

Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank

We use UK Biobank data to train predictors for 65 blood and urine markers such as HDL, LDL, lipoprotein A, glycated haemoglobin, etc. from SNP genotype. For example, our Polygenic Score (PGS) predictor correlates ∼0.76 with lipoprotein A level, which is highly heritable and an independent risk facto...

Descripción completa

Detalles Bibliográficos
Autores principales: Widen, Erik, Raben, Timothy G., Lello, Louis, Hsu, Stephen D. H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8308062/
https://www.ncbi.nlm.nih.gov/pubmed/34209487
http://dx.doi.org/10.3390/genes12070991
_version_ 1783728191849889792
author Widen, Erik
Raben, Timothy G.
Lello, Louis
Hsu, Stephen D. H.
author_facet Widen, Erik
Raben, Timothy G.
Lello, Louis
Hsu, Stephen D. H.
author_sort Widen, Erik
collection PubMed
description We use UK Biobank data to train predictors for 65 blood and urine markers such as HDL, LDL, lipoprotein A, glycated haemoglobin, etc. from SNP genotype. For example, our Polygenic Score (PGS) predictor correlates ∼0.76 with lipoprotein A level, which is highly heritable and an independent risk factor for heart disease. This may be the most accurate genomic prediction of a quantitative trait that has yet been produced (specifically, for European ancestry groups). We also train predictors of common disease risk using blood and urine biomarkers alone (no DNA information); we call these predictors biomarker risk scores, BMRS. Individuals who are at high risk (e.g., odds ratio of >5× population average) can be identified for conditions such as coronary artery disease (AUC∼0.75), diabetes (AUC∼0.95), hypertension, liver and kidney problems, and cancer using biomarkers alone. Our atherosclerotic cardiovascular disease (ASCVD) predictor uses ∼10 biomarkers and performs in UKB evaluation as well as or better than the American College of Cardiology ASCVD Risk Estimator, which uses quite different inputs (age, diagnostic history, BMI, smoking status, statin usage, etc.). We compare polygenic risk scores (risk conditional on genotype: PRS) for common diseases to the risk predictors which result from the concatenation of learned functions BMRS and PGS, i.e., applying the BMRS predictors to the PGS output.
format Online
Article
Text
id pubmed-8308062
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-83080622021-07-25 Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank Widen, Erik Raben, Timothy G. Lello, Louis Hsu, Stephen D. H. Genes (Basel) Article We use UK Biobank data to train predictors for 65 blood and urine markers such as HDL, LDL, lipoprotein A, glycated haemoglobin, etc. from SNP genotype. For example, our Polygenic Score (PGS) predictor correlates ∼0.76 with lipoprotein A level, which is highly heritable and an independent risk factor for heart disease. This may be the most accurate genomic prediction of a quantitative trait that has yet been produced (specifically, for European ancestry groups). We also train predictors of common disease risk using blood and urine biomarkers alone (no DNA information); we call these predictors biomarker risk scores, BMRS. Individuals who are at high risk (e.g., odds ratio of >5× population average) can be identified for conditions such as coronary artery disease (AUC∼0.75), diabetes (AUC∼0.95), hypertension, liver and kidney problems, and cancer using biomarkers alone. Our atherosclerotic cardiovascular disease (ASCVD) predictor uses ∼10 biomarkers and performs in UKB evaluation as well as or better than the American College of Cardiology ASCVD Risk Estimator, which uses quite different inputs (age, diagnostic history, BMI, smoking status, statin usage, etc.). We compare polygenic risk scores (risk conditional on genotype: PRS) for common diseases to the risk predictors which result from the concatenation of learned functions BMRS and PGS, i.e., applying the BMRS predictors to the PGS output. MDPI 2021-06-29 /pmc/articles/PMC8308062/ /pubmed/34209487 http://dx.doi.org/10.3390/genes12070991 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Widen, Erik
Raben, Timothy G.
Lello, Louis
Hsu, Stephen D. H.
Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank
title Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank
title_full Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank
title_fullStr Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank
title_full_unstemmed Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank
title_short Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank
title_sort machine learning prediction of biomarkers from snps and of disease risk from biomarkers in the uk biobank
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8308062/
https://www.ncbi.nlm.nih.gov/pubmed/34209487
http://dx.doi.org/10.3390/genes12070991
work_keys_str_mv AT widenerik machinelearningpredictionofbiomarkersfromsnpsandofdiseaseriskfrombiomarkersintheukbiobank
AT rabentimothyg machinelearningpredictionofbiomarkersfromsnpsandofdiseaseriskfrombiomarkersintheukbiobank
AT lellolouis machinelearningpredictionofbiomarkersfromsnpsandofdiseaseriskfrombiomarkersintheukbiobank
AT hsustephendh machinelearningpredictionofbiomarkersfromsnpsandofdiseaseriskfrombiomarkersintheukbiobank