Cargando…

Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive exPlanations Approach

OBJECTIVES: The number of deaths from cardiovascular disease is projected to reach 23.3 million by 2030. As a contribution to preventing this phenomenon, this paper proposed a machine learning (ML) model to predict patients with arteriosclerotic heart disease (AHD). We also interpreted the predictio...

Descripción completa

Detalles Bibliográficos
Autores principales: Miranda, Eka, Adiarto, Suko, Bhatti, Faqir M., Zakiyyah, Alfi Yusrotis, Aryuni, Mediana, Bernando, Charles
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Korean Society of Medical Informatics 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10440196/
https://www.ncbi.nlm.nih.gov/pubmed/37591678
http://dx.doi.org/10.4258/hir.2023.29.3.228
_version_ 1785093121769472000
author Miranda, Eka
Adiarto, Suko
Bhatti, Faqir M.
Zakiyyah, Alfi Yusrotis
Aryuni, Mediana
Bernando, Charles
author_facet Miranda, Eka
Adiarto, Suko
Bhatti, Faqir M.
Zakiyyah, Alfi Yusrotis
Aryuni, Mediana
Bernando, Charles
author_sort Miranda, Eka
collection PubMed
description OBJECTIVES: The number of deaths from cardiovascular disease is projected to reach 23.3 million by 2030. As a contribution to preventing this phenomenon, this paper proposed a machine learning (ML) model to predict patients with arteriosclerotic heart disease (AHD). We also interpreted the prediction model results based on the ML approach and deployed modelagnostic ML methods to identify informative features and their interpretations. METHODS: We used a hematology Electronic Health Record (EHR) with information on erythrocytes, hematocrit, hemoglobin, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, leukocytes, thrombocytes, age, and sex. To detect and predict AHD, we explored random forest (RF), XGBoost, and AdaBoost models. We examined the prediction model results based on the confusion matrix and accuracy measures. We used the Shapley Additive exPlanations (SHAP) framework to interpret the ML model and quantify the contribution of features to predictions. RESULTS: Our study included data from 6,837 patients, with 4,702 records from patients diagnosed with AHD and 2,135 records from patients without an AHD diagnosis. AdaBoost outperformed RF and XGBoost, achieving an accuracy of 0.78, precision of 0.82, F1-score of 0.85, and recall of 0.88. According to the SHAP summary bar plot method, hemoglobin was the most important attribute for detecting and predicting AHD patients. The SHAP local interpretability bar plot revealed that hemoglobin and mean corpuscular hemoglobin concentration had positive impacts on AHD prediction based on a single observation. CONCLUSIONS: ML models based on real clinical data can be used to predict AHD.
format Online
Article
Text
id pubmed-10440196
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Korean Society of Medical Informatics
record_format MEDLINE/PubMed
spelling pubmed-104401962023-08-21 Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive exPlanations Approach Miranda, Eka Adiarto, Suko Bhatti, Faqir M. Zakiyyah, Alfi Yusrotis Aryuni, Mediana Bernando, Charles Healthc Inform Res Original Article OBJECTIVES: The number of deaths from cardiovascular disease is projected to reach 23.3 million by 2030. As a contribution to preventing this phenomenon, this paper proposed a machine learning (ML) model to predict patients with arteriosclerotic heart disease (AHD). We also interpreted the prediction model results based on the ML approach and deployed modelagnostic ML methods to identify informative features and their interpretations. METHODS: We used a hematology Electronic Health Record (EHR) with information on erythrocytes, hematocrit, hemoglobin, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, leukocytes, thrombocytes, age, and sex. To detect and predict AHD, we explored random forest (RF), XGBoost, and AdaBoost models. We examined the prediction model results based on the confusion matrix and accuracy measures. We used the Shapley Additive exPlanations (SHAP) framework to interpret the ML model and quantify the contribution of features to predictions. RESULTS: Our study included data from 6,837 patients, with 4,702 records from patients diagnosed with AHD and 2,135 records from patients without an AHD diagnosis. AdaBoost outperformed RF and XGBoost, achieving an accuracy of 0.78, precision of 0.82, F1-score of 0.85, and recall of 0.88. According to the SHAP summary bar plot method, hemoglobin was the most important attribute for detecting and predicting AHD patients. The SHAP local interpretability bar plot revealed that hemoglobin and mean corpuscular hemoglobin concentration had positive impacts on AHD prediction based on a single observation. CONCLUSIONS: ML models based on real clinical data can be used to predict AHD. Korean Society of Medical Informatics 2023-07 2023-07-31 /pmc/articles/PMC10440196/ /pubmed/37591678 http://dx.doi.org/10.4258/hir.2023.29.3.228 Text en © 2023 The Korean Society of Medical Informatics https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Miranda, Eka
Adiarto, Suko
Bhatti, Faqir M.
Zakiyyah, Alfi Yusrotis
Aryuni, Mediana
Bernando, Charles
Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive exPlanations Approach
title Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive exPlanations Approach
title_full Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive exPlanations Approach
title_fullStr Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive exPlanations Approach
title_full_unstemmed Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive exPlanations Approach
title_short Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive exPlanations Approach
title_sort understanding arteriosclerotic heart disease patients using electronic health records: a machine learning and shapley additive explanations approach
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10440196/
https://www.ncbi.nlm.nih.gov/pubmed/37591678
http://dx.doi.org/10.4258/hir.2023.29.3.228
work_keys_str_mv AT mirandaeka understandingarterioscleroticheartdiseasepatientsusingelectronichealthrecordsamachinelearningandshapleyadditiveexplanationsapproach
AT adiartosuko understandingarterioscleroticheartdiseasepatientsusingelectronichealthrecordsamachinelearningandshapleyadditiveexplanationsapproach
AT bhattifaqirm understandingarterioscleroticheartdiseasepatientsusingelectronichealthrecordsamachinelearningandshapleyadditiveexplanationsapproach
AT zakiyyahalfiyusrotis understandingarterioscleroticheartdiseasepatientsusingelectronichealthrecordsamachinelearningandshapleyadditiveexplanationsapproach
AT aryunimediana understandingarterioscleroticheartdiseasepatientsusingelectronichealthrecordsamachinelearningandshapleyadditiveexplanationsapproach
AT bernandocharles understandingarterioscleroticheartdiseasepatientsusingelectronichealthrecordsamachinelearningandshapleyadditiveexplanationsapproach