Cargando…

Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants

BACKGROUND: Identifying people at risk of cardiovascular diseases (CVD) is a cornerstone of preventative cardiology. Risk prediction models currently recommended by clinical guidelines are typically based on a limited number of predictors with sub-optimal performance across all patient groups. Data-...

Descripción completa

Detalles Bibliográficos
Autores principales: Alaa, Ahmed M., Bolton, Thomas, Di Angelantonio, Emanuele, Rudd, James H. F., van der Schaar, Mihaela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6519796/
https://www.ncbi.nlm.nih.gov/pubmed/31091238
http://dx.doi.org/10.1371/journal.pone.0213653
_version_ 1783418665075474432
author Alaa, Ahmed M.
Bolton, Thomas
Di Angelantonio, Emanuele
Rudd, James H. F.
van der Schaar, Mihaela
author_facet Alaa, Ahmed M.
Bolton, Thomas
Di Angelantonio, Emanuele
Rudd, James H. F.
van der Schaar, Mihaela
author_sort Alaa, Ahmed M.
collection PubMed
description BACKGROUND: Identifying people at risk of cardiovascular diseases (CVD) is a cornerstone of preventative cardiology. Risk prediction models currently recommended by clinical guidelines are typically based on a limited number of predictors with sub-optimal performance across all patient groups. Data-driven techniques based on machine learning (ML) might improve the performance of risk predictions by agnostically discovering novel risk predictors and learning the complex interactions between them. We tested (1) whether ML techniques based on a state-of-the-art automated ML framework (AutoPrognosis) could improve CVD risk prediction compared to traditional approaches, and (2) whether considering non-traditional variables could increase the accuracy of CVD risk predictions. METHODS AND FINDINGS: Using data on 423,604 participants without CVD at baseline in UK Biobank, we developed a ML-based model for predicting CVD risk based on 473 available variables. Our ML-based model was derived using AutoPrognosis, an algorithmic tool that automatically selects and tunes ensembles of ML modeling pipelines (comprising data imputation, feature processing, classification and calibration algorithms). We compared our model with a well-established risk prediction algorithm based on conventional CVD risk factors (Framingham score), a Cox proportional hazards (PH) model based on familiar risk factors (i.e, age, gender, smoking status, systolic blood pressure, history of diabetes, reception of treatments for hypertension and body mass index), and a Cox PH model based on all of the 473 available variables. Predictive performances were assessed using area under the receiver operating characteristic curve (AUC-ROC). Overall, our AutoPrognosis model improved risk prediction (AUC-ROC: 0.774, 95% CI: 0.768-0.780) compared to Framingham score (AUC-ROC: 0.724, 95% CI: 0.720-0.728, p < 0.001), Cox PH model with conventional risk factors (AUC-ROC: 0.734, 95% CI: 0.729-0.739, p < 0.001), and Cox PH model with all UK Biobank variables (AUC-ROC: 0.758, 95% CI: 0.753-0.763, p < 0.001). Out of 4,801 CVD cases recorded within 5 years of baseline, AutoPrognosis was able to correctly predict 368 more cases compared to the Framingham score. Our AutoPrognosis model included predictors that are not usually considered in existing risk prediction models, such as the individuals’ usual walking pace and their self-reported overall health rating. Furthermore, our model improved risk prediction in potentially relevant sub-populations, such as in individuals with history of diabetes. We also highlight the relative benefits accrued from including more information into a predictive model (information gain) as compared to the benefits of using more complex models (modeling gain). CONCLUSIONS: Our AutoPrognosis model improves the accuracy of CVD risk prediction in the UK Biobank population. This approach performs well in traditionally poorly served patient subgroups. Additionally, AutoPrognosis uncovered novel predictors for CVD disease that may now be tested in prospective studies. We found that the “information gain” achieved by considering more risk factors in the predictive model was significantly higher than the “modeling gain” achieved by adopting complex predictive models.
format Online
Article
Text
id pubmed-6519796
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-65197962019-05-31 Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants Alaa, Ahmed M. Bolton, Thomas Di Angelantonio, Emanuele Rudd, James H. F. van der Schaar, Mihaela PLoS One Research Article BACKGROUND: Identifying people at risk of cardiovascular diseases (CVD) is a cornerstone of preventative cardiology. Risk prediction models currently recommended by clinical guidelines are typically based on a limited number of predictors with sub-optimal performance across all patient groups. Data-driven techniques based on machine learning (ML) might improve the performance of risk predictions by agnostically discovering novel risk predictors and learning the complex interactions between them. We tested (1) whether ML techniques based on a state-of-the-art automated ML framework (AutoPrognosis) could improve CVD risk prediction compared to traditional approaches, and (2) whether considering non-traditional variables could increase the accuracy of CVD risk predictions. METHODS AND FINDINGS: Using data on 423,604 participants without CVD at baseline in UK Biobank, we developed a ML-based model for predicting CVD risk based on 473 available variables. Our ML-based model was derived using AutoPrognosis, an algorithmic tool that automatically selects and tunes ensembles of ML modeling pipelines (comprising data imputation, feature processing, classification and calibration algorithms). We compared our model with a well-established risk prediction algorithm based on conventional CVD risk factors (Framingham score), a Cox proportional hazards (PH) model based on familiar risk factors (i.e, age, gender, smoking status, systolic blood pressure, history of diabetes, reception of treatments for hypertension and body mass index), and a Cox PH model based on all of the 473 available variables. Predictive performances were assessed using area under the receiver operating characteristic curve (AUC-ROC). Overall, our AutoPrognosis model improved risk prediction (AUC-ROC: 0.774, 95% CI: 0.768-0.780) compared to Framingham score (AUC-ROC: 0.724, 95% CI: 0.720-0.728, p < 0.001), Cox PH model with conventional risk factors (AUC-ROC: 0.734, 95% CI: 0.729-0.739, p < 0.001), and Cox PH model with all UK Biobank variables (AUC-ROC: 0.758, 95% CI: 0.753-0.763, p < 0.001). Out of 4,801 CVD cases recorded within 5 years of baseline, AutoPrognosis was able to correctly predict 368 more cases compared to the Framingham score. Our AutoPrognosis model included predictors that are not usually considered in existing risk prediction models, such as the individuals’ usual walking pace and their self-reported overall health rating. Furthermore, our model improved risk prediction in potentially relevant sub-populations, such as in individuals with history of diabetes. We also highlight the relative benefits accrued from including more information into a predictive model (information gain) as compared to the benefits of using more complex models (modeling gain). CONCLUSIONS: Our AutoPrognosis model improves the accuracy of CVD risk prediction in the UK Biobank population. This approach performs well in traditionally poorly served patient subgroups. Additionally, AutoPrognosis uncovered novel predictors for CVD disease that may now be tested in prospective studies. We found that the “information gain” achieved by considering more risk factors in the predictive model was significantly higher than the “modeling gain” achieved by adopting complex predictive models. Public Library of Science 2019-05-15 /pmc/articles/PMC6519796/ /pubmed/31091238 http://dx.doi.org/10.1371/journal.pone.0213653 Text en © 2019 Alaa et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Alaa, Ahmed M.
Bolton, Thomas
Di Angelantonio, Emanuele
Rudd, James H. F.
van der Schaar, Mihaela
Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants
title Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants
title_full Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants
title_fullStr Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants
title_full_unstemmed Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants
title_short Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants
title_sort cardiovascular disease risk prediction using automated machine learning: a prospective study of 423,604 uk biobank participants
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6519796/
https://www.ncbi.nlm.nih.gov/pubmed/31091238
http://dx.doi.org/10.1371/journal.pone.0213653
work_keys_str_mv AT alaaahmedm cardiovasculardiseaseriskpredictionusingautomatedmachinelearningaprospectivestudyof423604ukbiobankparticipants
AT boltonthomas cardiovasculardiseaseriskpredictionusingautomatedmachinelearningaprospectivestudyof423604ukbiobankparticipants
AT diangelantonioemanuele cardiovasculardiseaseriskpredictionusingautomatedmachinelearningaprospectivestudyof423604ukbiobankparticipants
AT ruddjameshf cardiovasculardiseaseriskpredictionusingautomatedmachinelearningaprospectivestudyof423604ukbiobankparticipants
AT vanderschaarmihaela cardiovasculardiseaseriskpredictionusingautomatedmachinelearningaprospectivestudyof423604ukbiobankparticipants