Cargando…
Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients
OBJECTIVES: Identifying high-risk patients is crucial for effective cardiovascular disease (CVD) prevention. It is not known whether electronic health record (EHR)-based machine-learning (ML) models can improve CVD risk stratification compared with a secondary prevention risk score developed from ra...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BMJ Publishing Group
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8527119/ https://www.ncbi.nlm.nih.gov/pubmed/34667093 http://dx.doi.org/10.1136/openhrt-2021-001802 |
_version_ | 1784586015083593728 |
---|---|
author | Sarraju, Ashish Ward, Andrew Chung, Sukyung Li, Jiang Scheinker, David Rodríguez, Fàtima |
author_facet | Sarraju, Ashish Ward, Andrew Chung, Sukyung Li, Jiang Scheinker, David Rodríguez, Fàtima |
author_sort | Sarraju, Ashish |
collection | PubMed |
description | OBJECTIVES: Identifying high-risk patients is crucial for effective cardiovascular disease (CVD) prevention. It is not known whether electronic health record (EHR)-based machine-learning (ML) models can improve CVD risk stratification compared with a secondary prevention risk score developed from randomised clinical trials (Thrombolysis in Myocardial Infarction Risk Score for Secondary Prevention, TRS 2°P). METHODS: We identified patients with CVD in a large health system, including atherosclerotic CVD (ASCVD), split into 80% training and 20% test sets. A rich set of EHR patient features was extracted. ML models were trained to estimate 5-year CVD event risk (random forests (RF), gradient-boosted machines (GBM), extreme gradient-boosted models (XGBoost), logistic regression with an L(2) penalty and L(1) penalty (Lasso)). ML models and TRS 2°P were evaluated by the area under the receiver operating characteristic curve (AUC). RESULTS: The cohort included 32 192 patients (median age 74 years, with 46% female, 63% non-Hispanic white and 12% Asian patients and 23 475 patients with ASCVD). There were 4010 events over 5 years of follow-up. ML models demonstrated good overall performance; XGBoost demonstrated AUC 0.70 (95% CI 0.68 to 0.71) in the full CVD cohort and AUC 0.71 (95% CI 0.69 to 0.73) in patients with ASCVD, with comparable performance by GBM, RF and Lasso. TRS 2°P performed poorly in all CVD (AUC 0.51, 95% CI 0.50 to 0.53) and ASCVD (AUC 0.50, 95% CI 0.48 to 0.52) patients. ML identified nontraditional predictive variables including education level and primary care visits. CONCLUSIONS: In a multiethnic real-world population, EHR-based ML approaches significantly improved CVD risk stratification for secondary prevention. |
format | Online Article Text |
id | pubmed-8527119 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BMJ Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-85271192021-11-04 Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients Sarraju, Ashish Ward, Andrew Chung, Sukyung Li, Jiang Scheinker, David Rodríguez, Fàtima Open Heart Cardiac Risk Factors and Prevention OBJECTIVES: Identifying high-risk patients is crucial for effective cardiovascular disease (CVD) prevention. It is not known whether electronic health record (EHR)-based machine-learning (ML) models can improve CVD risk stratification compared with a secondary prevention risk score developed from randomised clinical trials (Thrombolysis in Myocardial Infarction Risk Score for Secondary Prevention, TRS 2°P). METHODS: We identified patients with CVD in a large health system, including atherosclerotic CVD (ASCVD), split into 80% training and 20% test sets. A rich set of EHR patient features was extracted. ML models were trained to estimate 5-year CVD event risk (random forests (RF), gradient-boosted machines (GBM), extreme gradient-boosted models (XGBoost), logistic regression with an L(2) penalty and L(1) penalty (Lasso)). ML models and TRS 2°P were evaluated by the area under the receiver operating characteristic curve (AUC). RESULTS: The cohort included 32 192 patients (median age 74 years, with 46% female, 63% non-Hispanic white and 12% Asian patients and 23 475 patients with ASCVD). There were 4010 events over 5 years of follow-up. ML models demonstrated good overall performance; XGBoost demonstrated AUC 0.70 (95% CI 0.68 to 0.71) in the full CVD cohort and AUC 0.71 (95% CI 0.69 to 0.73) in patients with ASCVD, with comparable performance by GBM, RF and Lasso. TRS 2°P performed poorly in all CVD (AUC 0.51, 95% CI 0.50 to 0.53) and ASCVD (AUC 0.50, 95% CI 0.48 to 0.52) patients. ML identified nontraditional predictive variables including education level and primary care visits. CONCLUSIONS: In a multiethnic real-world population, EHR-based ML approaches significantly improved CVD risk stratification for secondary prevention. BMJ Publishing Group 2021-10-19 /pmc/articles/PMC8527119/ /pubmed/34667093 http://dx.doi.org/10.1136/openhrt-2021-001802 Text en © Author(s) (or their employer(s)) 2021. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) . |
spellingShingle | Cardiac Risk Factors and Prevention Sarraju, Ashish Ward, Andrew Chung, Sukyung Li, Jiang Scheinker, David Rodríguez, Fàtima Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients |
title | Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients |
title_full | Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients |
title_fullStr | Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients |
title_full_unstemmed | Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients |
title_short | Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients |
title_sort | machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients |
topic | Cardiac Risk Factors and Prevention |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8527119/ https://www.ncbi.nlm.nih.gov/pubmed/34667093 http://dx.doi.org/10.1136/openhrt-2021-001802 |
work_keys_str_mv | AT sarrajuashish machinelearningapproachesimproveriskstratificationforsecondarycardiovasculardiseasepreventioninmultiethnicpatients AT wardandrew machinelearningapproachesimproveriskstratificationforsecondarycardiovasculardiseasepreventioninmultiethnicpatients AT chungsukyung machinelearningapproachesimproveriskstratificationforsecondarycardiovasculardiseasepreventioninmultiethnicpatients AT lijiang machinelearningapproachesimproveriskstratificationforsecondarycardiovasculardiseasepreventioninmultiethnicpatients AT scheinkerdavid machinelearningapproachesimproveriskstratificationforsecondarycardiovasculardiseasepreventioninmultiethnicpatients AT rodriguezfatima machinelearningapproachesimproveriskstratificationforsecondarycardiovasculardiseasepreventioninmultiethnicpatients |