Cargando…

An Explainable Machine Learning Pipeline for Stroke Prediction on Imbalanced Data

Stroke is an acute neurological dysfunction attributed to a focal injury of the central nervous system due to reduced blood flow to the brain. Nowadays, stroke is a global threat associated with premature death and huge economic consequences. Hence, there is an urgency to model the effect of several...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kokkotis, Christos, Giarmatzis, Georgios, Giannakou, Erasmia, Moustakidis, Serafeim, Tsatalas, Themistoklis, Tsiptsios, Dimitrios, Vadikolias, Konstantinos, Aggelousis, Nikolaos
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9600473/ https://www.ncbi.nlm.nih.gov/pubmed/36292081 http://dx.doi.org/10.3390/diagnostics12102392

_version_	1784816851291734016
author	Kokkotis, Christos Giarmatzis, Georgios Giannakou, Erasmia Moustakidis, Serafeim Tsatalas, Themistoklis Tsiptsios, Dimitrios Vadikolias, Konstantinos Aggelousis, Nikolaos
author_facet	Kokkotis, Christos Giarmatzis, Georgios Giannakou, Erasmia Moustakidis, Serafeim Tsatalas, Themistoklis Tsiptsios, Dimitrios Vadikolias, Konstantinos Aggelousis, Nikolaos
author_sort	Kokkotis, Christos
collection	PubMed
description	Stroke is an acute neurological dysfunction attributed to a focal injury of the central nervous system due to reduced blood flow to the brain. Nowadays, stroke is a global threat associated with premature death and huge economic consequences. Hence, there is an urgency to model the effect of several risk factors on stroke occurrence, and artificial intelligence (AI) seems to be the appropriate tool. In the present study, we aimed to (i) develop reliable machine learning (ML) prediction models for stroke disease; (ii) cope with a typical severe class imbalance problem, which is posed due to the stroke patients’ class being significantly smaller than the healthy class; and (iii) interpret the model output for understanding the decision-making mechanism. The effectiveness of the proposed ML approach was investigated in a comparative analysis with six well-known classifiers with respect to metrics that are related to both generalization capability and prediction accuracy. The best overall false-negative rate was achieved by the Multi-Layer Perceptron (MLP) classifier (18.60%). Shapley Additive Explanations (SHAP) were employed to investigate the impact of the risk factors on the prediction output. The proposed AI method could lead to the creation of advanced and effective risk stratification strategies for each stroke patient, which would allow for timely diagnosis and the right treatments.
format	Online Article Text
id	pubmed-9600473
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-96004732022-10-27 An Explainable Machine Learning Pipeline for Stroke Prediction on Imbalanced Data Kokkotis, Christos Giarmatzis, Georgios Giannakou, Erasmia Moustakidis, Serafeim Tsatalas, Themistoklis Tsiptsios, Dimitrios Vadikolias, Konstantinos Aggelousis, Nikolaos Diagnostics (Basel) Article Stroke is an acute neurological dysfunction attributed to a focal injury of the central nervous system due to reduced blood flow to the brain. Nowadays, stroke is a global threat associated with premature death and huge economic consequences. Hence, there is an urgency to model the effect of several risk factors on stroke occurrence, and artificial intelligence (AI) seems to be the appropriate tool. In the present study, we aimed to (i) develop reliable machine learning (ML) prediction models for stroke disease; (ii) cope with a typical severe class imbalance problem, which is posed due to the stroke patients’ class being significantly smaller than the healthy class; and (iii) interpret the model output for understanding the decision-making mechanism. The effectiveness of the proposed ML approach was investigated in a comparative analysis with six well-known classifiers with respect to metrics that are related to both generalization capability and prediction accuracy. The best overall false-negative rate was achieved by the Multi-Layer Perceptron (MLP) classifier (18.60%). Shapley Additive Explanations (SHAP) were employed to investigate the impact of the risk factors on the prediction output. The proposed AI method could lead to the creation of advanced and effective risk stratification strategies for each stroke patient, which would allow for timely diagnosis and the right treatments. MDPI 2022-10-01 /pmc/articles/PMC9600473/ /pubmed/36292081 http://dx.doi.org/10.3390/diagnostics12102392 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Kokkotis, Christos Giarmatzis, Georgios Giannakou, Erasmia Moustakidis, Serafeim Tsatalas, Themistoklis Tsiptsios, Dimitrios Vadikolias, Konstantinos Aggelousis, Nikolaos An Explainable Machine Learning Pipeline for Stroke Prediction on Imbalanced Data
title	An Explainable Machine Learning Pipeline for Stroke Prediction on Imbalanced Data
title_full	An Explainable Machine Learning Pipeline for Stroke Prediction on Imbalanced Data
title_fullStr	An Explainable Machine Learning Pipeline for Stroke Prediction on Imbalanced Data
title_full_unstemmed	An Explainable Machine Learning Pipeline for Stroke Prediction on Imbalanced Data
title_short	An Explainable Machine Learning Pipeline for Stroke Prediction on Imbalanced Data
title_sort	explainable machine learning pipeline for stroke prediction on imbalanced data
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9600473/ https://www.ncbi.nlm.nih.gov/pubmed/36292081 http://dx.doi.org/10.3390/diagnostics12102392
work_keys_str_mv	AT kokkotischristos anexplainablemachinelearningpipelineforstrokepredictiononimbalanceddata AT giarmatzisgeorgios anexplainablemachinelearningpipelineforstrokepredictiononimbalanceddata AT giannakouerasmia anexplainablemachinelearningpipelineforstrokepredictiononimbalanceddata AT moustakidisserafeim anexplainablemachinelearningpipelineforstrokepredictiononimbalanceddata AT tsatalasthemistoklis anexplainablemachinelearningpipelineforstrokepredictiononimbalanceddata AT tsiptsiosdimitrios anexplainablemachinelearningpipelineforstrokepredictiononimbalanceddata AT vadikoliaskonstantinos anexplainablemachinelearningpipelineforstrokepredictiononimbalanceddata AT aggelousisnikolaos anexplainablemachinelearningpipelineforstrokepredictiononimbalanceddata AT kokkotischristos explainablemachinelearningpipelineforstrokepredictiononimbalanceddata AT giarmatzisgeorgios explainablemachinelearningpipelineforstrokepredictiononimbalanceddata AT giannakouerasmia explainablemachinelearningpipelineforstrokepredictiononimbalanceddata AT moustakidisserafeim explainablemachinelearningpipelineforstrokepredictiononimbalanceddata AT tsatalasthemistoklis explainablemachinelearningpipelineforstrokepredictiononimbalanceddata AT tsiptsiosdimitrios explainablemachinelearningpipelineforstrokepredictiononimbalanceddata AT vadikoliaskonstantinos explainablemachinelearningpipelineforstrokepredictiononimbalanceddata AT aggelousisnikolaos explainablemachinelearningpipelineforstrokepredictiononimbalanceddata

An Explainable Machine Learning Pipeline for Stroke Prediction on Imbalanced Data

Ejemplares similares