Cargando…

Comparing Explainable Machine Learning Approaches With Traditional Statistical Methods for Evaluating Stroke Risk Models: Retrospective Cohort Study

BACKGROUND: Stroke has multiple modifiable and nonmodifiable risk factors and represents a leading cause of death globally. Understanding the complex interplay of stroke risk factors is thus not only a scientific necessity but a critical step toward improving global health outcomes. OBJECTIVE: We ai...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lolak, Sermkiat, Attia, John, McKay, Gareth J, Thakkinstian, Ammarin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2023
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10413234/ https://www.ncbi.nlm.nih.gov/pubmed/37494080 http://dx.doi.org/10.2196/47736

_version_	1785087090265948160
author	Lolak, Sermkiat Attia, John McKay, Gareth J Thakkinstian, Ammarin
author_facet	Lolak, Sermkiat Attia, John McKay, Gareth J Thakkinstian, Ammarin
author_sort	Lolak, Sermkiat
collection	PubMed
description	BACKGROUND: Stroke has multiple modifiable and nonmodifiable risk factors and represents a leading cause of death globally. Understanding the complex interplay of stroke risk factors is thus not only a scientific necessity but a critical step toward improving global health outcomes. OBJECTIVE: We aim to assess the performance of explainable machine learning models in predicting stroke risk factors using real-world cohort data by comparing explainable machine learning models with conventional statistical methods. METHODS: This retrospective cohort included high-risk patients from Ramathibodi Hospital in Thailand between January 2010 and December 2020. We compared the performance and explainability of logistic regression (LR), Cox proportional hazard, Bayesian network (BN), tree-augmented Naïve Bayes (TAN), extreme gradient boosting (XGBoost), and explainable boosting machine (EBM) models. We used multiple imputation by chained equations for missing data and discretized continuous variables as needed. Models were evaluated using C-statistics and F(1)-scores. RESULTS: Out of 275,247 high-risk patients, 9659 (3.5%) experienced a stroke. XGBoost demonstrated the highest performance with a C-statistic of 0.89 and an F(1)-score of 0.80 followed by EBM and TAN with C-statistics of 0.87 and 0.83, respectively; LR and BN had similar C-statistics of 0.80. Significant factors associated with stroke included atrial fibrillation (AF), hypertension (HT), antiplatelets, HDL, and age. AF, HT, and antihypertensive medication were common significant factors across most models, with AF being the strongest factor in LR, XGBoost, BN, and TAN models. CONCLUSIONS: Our study developed stroke prediction models to identify crucial predictive factors such as AF, HT, or systolic blood pressure or antihypertensive medication, anticoagulant medication, HDL, age, and statin use in high-risk patients. The explainable XGBoost was the best model in predicting stroke risk, followed by EBM.
format	Online Article Text
id	pubmed-10413234
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-104132342023-08-11 Comparing Explainable Machine Learning Approaches With Traditional Statistical Methods for Evaluating Stroke Risk Models: Retrospective Cohort Study Lolak, Sermkiat Attia, John McKay, Gareth J Thakkinstian, Ammarin JMIR Cardio Original Paper BACKGROUND: Stroke has multiple modifiable and nonmodifiable risk factors and represents a leading cause of death globally. Understanding the complex interplay of stroke risk factors is thus not only a scientific necessity but a critical step toward improving global health outcomes. OBJECTIVE: We aim to assess the performance of explainable machine learning models in predicting stroke risk factors using real-world cohort data by comparing explainable machine learning models with conventional statistical methods. METHODS: This retrospective cohort included high-risk patients from Ramathibodi Hospital in Thailand between January 2010 and December 2020. We compared the performance and explainability of logistic regression (LR), Cox proportional hazard, Bayesian network (BN), tree-augmented Naïve Bayes (TAN), extreme gradient boosting (XGBoost), and explainable boosting machine (EBM) models. We used multiple imputation by chained equations for missing data and discretized continuous variables as needed. Models were evaluated using C-statistics and F(1)-scores. RESULTS: Out of 275,247 high-risk patients, 9659 (3.5%) experienced a stroke. XGBoost demonstrated the highest performance with a C-statistic of 0.89 and an F(1)-score of 0.80 followed by EBM and TAN with C-statistics of 0.87 and 0.83, respectively; LR and BN had similar C-statistics of 0.80. Significant factors associated with stroke included atrial fibrillation (AF), hypertension (HT), antiplatelets, HDL, and age. AF, HT, and antihypertensive medication were common significant factors across most models, with AF being the strongest factor in LR, XGBoost, BN, and TAN models. CONCLUSIONS: Our study developed stroke prediction models to identify crucial predictive factors such as AF, HT, or systolic blood pressure or antihypertensive medication, anticoagulant medication, HDL, age, and statin use in high-risk patients. The explainable XGBoost was the best model in predicting stroke risk, followed by EBM. JMIR Publications 2023-07-26 /pmc/articles/PMC10413234/ /pubmed/37494080 http://dx.doi.org/10.2196/47736 Text en ©Sermkiat Lolak, John Attia, Gareth J McKay, Ammarin Thakkinstian. Originally published in JMIR Cardio (https://cardio.jmir.org), 26.07.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Cardio, is properly cited. The complete bibliographic information, a link to the original publication on https://cardio.jmir.org, as well as this copyright and license information must be included.
spellingShingle	Original Paper Lolak, Sermkiat Attia, John McKay, Gareth J Thakkinstian, Ammarin Comparing Explainable Machine Learning Approaches With Traditional Statistical Methods for Evaluating Stroke Risk Models: Retrospective Cohort Study
title	Comparing Explainable Machine Learning Approaches With Traditional Statistical Methods for Evaluating Stroke Risk Models: Retrospective Cohort Study
title_full	Comparing Explainable Machine Learning Approaches With Traditional Statistical Methods for Evaluating Stroke Risk Models: Retrospective Cohort Study
title_fullStr	Comparing Explainable Machine Learning Approaches With Traditional Statistical Methods for Evaluating Stroke Risk Models: Retrospective Cohort Study
title_full_unstemmed	Comparing Explainable Machine Learning Approaches With Traditional Statistical Methods for Evaluating Stroke Risk Models: Retrospective Cohort Study
title_short	Comparing Explainable Machine Learning Approaches With Traditional Statistical Methods for Evaluating Stroke Risk Models: Retrospective Cohort Study
title_sort	comparing explainable machine learning approaches with traditional statistical methods for evaluating stroke risk models: retrospective cohort study
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10413234/ https://www.ncbi.nlm.nih.gov/pubmed/37494080 http://dx.doi.org/10.2196/47736
work_keys_str_mv	AT lolaksermkiat comparingexplainablemachinelearningapproacheswithtraditionalstatisticalmethodsforevaluatingstrokeriskmodelsretrospectivecohortstudy AT attiajohn comparingexplainablemachinelearningapproacheswithtraditionalstatisticalmethodsforevaluatingstrokeriskmodelsretrospectivecohortstudy AT mckaygarethj comparingexplainablemachinelearningapproacheswithtraditionalstatisticalmethodsforevaluatingstrokeriskmodelsretrospectivecohortstudy AT thakkinstianammarin comparingexplainablemachinelearningapproacheswithtraditionalstatisticalmethodsforevaluatingstrokeriskmodelsretrospectivecohortstudy

Comparing Explainable Machine Learning Approaches With Traditional Statistical Methods for Evaluating Stroke Risk Models: Retrospective Cohort Study

Ejemplares similares