Cargando…

Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients

OBJECTIVES: Identifying high-risk patients is crucial for effective cardiovascular disease (CVD) prevention. It is not known whether electronic health record (EHR)-based machine-learning (ML) models can improve CVD risk stratification compared with a secondary prevention risk score developed from ra...

Descripción completa

Detalles Bibliográficos
Autores principales: Sarraju, Ashish, Ward, Andrew, Chung, Sukyung, Li, Jiang, Scheinker, David, Rodríguez, Fàtima
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8527119/
https://www.ncbi.nlm.nih.gov/pubmed/34667093
http://dx.doi.org/10.1136/openhrt-2021-001802
_version_ 1784586015083593728
author Sarraju, Ashish
Ward, Andrew
Chung, Sukyung
Li, Jiang
Scheinker, David
Rodríguez, Fàtima
author_facet Sarraju, Ashish
Ward, Andrew
Chung, Sukyung
Li, Jiang
Scheinker, David
Rodríguez, Fàtima
author_sort Sarraju, Ashish
collection PubMed
description OBJECTIVES: Identifying high-risk patients is crucial for effective cardiovascular disease (CVD) prevention. It is not known whether electronic health record (EHR)-based machine-learning (ML) models can improve CVD risk stratification compared with a secondary prevention risk score developed from randomised clinical trials (Thrombolysis in Myocardial Infarction Risk Score for Secondary Prevention, TRS 2°P). METHODS: We identified patients with CVD in a large health system, including atherosclerotic CVD (ASCVD), split into 80% training and 20% test sets. A rich set of EHR patient features was extracted. ML models were trained to estimate 5-year CVD event risk (random forests (RF), gradient-boosted machines (GBM), extreme gradient-boosted models (XGBoost), logistic regression with an L(2) penalty and L(1) penalty (Lasso)). ML models and TRS 2°P were evaluated by the area under the receiver operating characteristic curve (AUC). RESULTS: The cohort included 32 192 patients (median age 74 years, with 46% female, 63% non-Hispanic white and 12% Asian patients and 23 475 patients with ASCVD). There were 4010 events over 5 years of follow-up. ML models demonstrated good overall performance; XGBoost demonstrated AUC 0.70 (95% CI 0.68 to 0.71) in the full CVD cohort and AUC 0.71 (95% CI 0.69 to 0.73) in patients with ASCVD, with comparable performance by GBM, RF and Lasso. TRS 2°P performed poorly in all CVD (AUC 0.51, 95% CI 0.50 to 0.53) and ASCVD (AUC 0.50, 95% CI 0.48 to 0.52) patients. ML identified nontraditional predictive variables including education level and primary care visits. CONCLUSIONS: In a multiethnic real-world population, EHR-based ML approaches significantly improved CVD risk stratification for secondary prevention.
format Online
Article
Text
id pubmed-8527119
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-85271192021-11-04 Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients Sarraju, Ashish Ward, Andrew Chung, Sukyung Li, Jiang Scheinker, David Rodríguez, Fàtima Open Heart Cardiac Risk Factors and Prevention OBJECTIVES: Identifying high-risk patients is crucial for effective cardiovascular disease (CVD) prevention. It is not known whether electronic health record (EHR)-based machine-learning (ML) models can improve CVD risk stratification compared with a secondary prevention risk score developed from randomised clinical trials (Thrombolysis in Myocardial Infarction Risk Score for Secondary Prevention, TRS 2°P). METHODS: We identified patients with CVD in a large health system, including atherosclerotic CVD (ASCVD), split into 80% training and 20% test sets. A rich set of EHR patient features was extracted. ML models were trained to estimate 5-year CVD event risk (random forests (RF), gradient-boosted machines (GBM), extreme gradient-boosted models (XGBoost), logistic regression with an L(2) penalty and L(1) penalty (Lasso)). ML models and TRS 2°P were evaluated by the area under the receiver operating characteristic curve (AUC). RESULTS: The cohort included 32 192 patients (median age 74 years, with 46% female, 63% non-Hispanic white and 12% Asian patients and 23 475 patients with ASCVD). There were 4010 events over 5 years of follow-up. ML models demonstrated good overall performance; XGBoost demonstrated AUC 0.70 (95% CI 0.68 to 0.71) in the full CVD cohort and AUC 0.71 (95% CI 0.69 to 0.73) in patients with ASCVD, with comparable performance by GBM, RF and Lasso. TRS 2°P performed poorly in all CVD (AUC 0.51, 95% CI 0.50 to 0.53) and ASCVD (AUC 0.50, 95% CI 0.48 to 0.52) patients. ML identified nontraditional predictive variables including education level and primary care visits. CONCLUSIONS: In a multiethnic real-world population, EHR-based ML approaches significantly improved CVD risk stratification for secondary prevention. BMJ Publishing Group 2021-10-19 /pmc/articles/PMC8527119/ /pubmed/34667093 http://dx.doi.org/10.1136/openhrt-2021-001802 Text en © Author(s) (or their employer(s)) 2021. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) .
spellingShingle Cardiac Risk Factors and Prevention
Sarraju, Ashish
Ward, Andrew
Chung, Sukyung
Li, Jiang
Scheinker, David
Rodríguez, Fàtima
Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients
title Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients
title_full Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients
title_fullStr Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients
title_full_unstemmed Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients
title_short Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients
title_sort machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients
topic Cardiac Risk Factors and Prevention
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8527119/
https://www.ncbi.nlm.nih.gov/pubmed/34667093
http://dx.doi.org/10.1136/openhrt-2021-001802
work_keys_str_mv AT sarrajuashish machinelearningapproachesimproveriskstratificationforsecondarycardiovasculardiseasepreventioninmultiethnicpatients
AT wardandrew machinelearningapproachesimproveriskstratificationforsecondarycardiovasculardiseasepreventioninmultiethnicpatients
AT chungsukyung machinelearningapproachesimproveriskstratificationforsecondarycardiovasculardiseasepreventioninmultiethnicpatients
AT lijiang machinelearningapproachesimproveriskstratificationforsecondarycardiovasculardiseasepreventioninmultiethnicpatients
AT scheinkerdavid machinelearningapproachesimproveriskstratificationforsecondarycardiovasculardiseasepreventioninmultiethnicpatients
AT rodriguezfatima machinelearningapproachesimproveriskstratificationforsecondarycardiovasculardiseasepreventioninmultiethnicpatients