Cargando…

Using Item Response Theory for Explainable Machine Learning in Predicting Mortality in the Intensive Care Unit: Case-Based Approach

BACKGROUND: Supervised machine learning (ML) is being featured in the health care literature with study results frequently reported using metrics such as accuracy, sensitivity, specificity, recall, or F1 score. Although each metric provides a different perspective on the performance, they remain to...

Descripción completa

Detalles Bibliográficos
Autores principales: Kline, Adrienne, Kline, Theresa, Shakeri Hossein Abad, Zahra, Lee, Joon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7547395/
https://www.ncbi.nlm.nih.gov/pubmed/32975523
http://dx.doi.org/10.2196/20268
_version_ 1783592411890450432
author Kline, Adrienne
Kline, Theresa
Shakeri Hossein Abad, Zahra
Lee, Joon
author_facet Kline, Adrienne
Kline, Theresa
Shakeri Hossein Abad, Zahra
Lee, Joon
author_sort Kline, Adrienne
collection PubMed
description BACKGROUND: Supervised machine learning (ML) is being featured in the health care literature with study results frequently reported using metrics such as accuracy, sensitivity, specificity, recall, or F1 score. Although each metric provides a different perspective on the performance, they remain to be overall measures for the whole sample, discounting the uniqueness of each case or patient. Intuitively, we know that all cases are not equal, but the present evaluative approaches do not take case difficulty into account. OBJECTIVE: A more case-based, comprehensive approach is warranted to assess supervised ML outcomes and forms the rationale for this study. This study aims to demonstrate how the item response theory (IRT) can be used to stratify the data based on how difficult each case is to classify, independent of the outcome measure of interest (eg, accuracy). This stratification allows the evaluation of ML classifiers to take the form of a distribution rather than a single scalar value. METHODS: Two large, public intensive care unit data sets, Medical Information Mart for Intensive Care III and electronic intensive care unit, were used to showcase this method in predicting mortality. For each data set, a balanced sample (n=8078 and n=21,940, respectively) and an imbalanced sample (n=12,117 and n=32,910, respectively) were drawn. A 2-parameter logistic model was used to provide scores for each case. Several ML algorithms were used in the demonstration to classify cases based on their health-related features: logistic regression, linear discriminant analysis, K-nearest neighbors, decision tree, naive Bayes, and a neural network. Generalized linear mixed model analyses were used to assess the effects of case difficulty strata, ML algorithm, and the interaction between them in predicting accuracy. RESULTS: The results showed significant effects (P<.001) for case difficulty strata, ML algorithm, and their interaction in predicting accuracy and illustrated that all classifiers performed better with easier-to-classify cases and that overall the neural network performed best. Significant interactions suggest that cases that fall in the most arduous strata should be handled by logistic regression, linear discriminant analysis, decision tree, or neural network but not by naive Bayes or K-nearest neighbors. Conventional metrics for ML classification have been reported for methodological comparison. CONCLUSIONS: This demonstration shows that using the IRT is a viable method for understanding the data that are provided to ML algorithms, independent of outcome measures, and highlights how well classifiers differentiate cases of varying difficulty. This method explains which features are indicative of healthy states and why. It enables end users to tailor the classifier that is appropriate to the difficulty level of the patient for personalized medicine.
format Online
Article
Text
id pubmed-7547395
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-75473952020-10-22 Using Item Response Theory for Explainable Machine Learning in Predicting Mortality in the Intensive Care Unit: Case-Based Approach Kline, Adrienne Kline, Theresa Shakeri Hossein Abad, Zahra Lee, Joon J Med Internet Res Original Paper BACKGROUND: Supervised machine learning (ML) is being featured in the health care literature with study results frequently reported using metrics such as accuracy, sensitivity, specificity, recall, or F1 score. Although each metric provides a different perspective on the performance, they remain to be overall measures for the whole sample, discounting the uniqueness of each case or patient. Intuitively, we know that all cases are not equal, but the present evaluative approaches do not take case difficulty into account. OBJECTIVE: A more case-based, comprehensive approach is warranted to assess supervised ML outcomes and forms the rationale for this study. This study aims to demonstrate how the item response theory (IRT) can be used to stratify the data based on how difficult each case is to classify, independent of the outcome measure of interest (eg, accuracy). This stratification allows the evaluation of ML classifiers to take the form of a distribution rather than a single scalar value. METHODS: Two large, public intensive care unit data sets, Medical Information Mart for Intensive Care III and electronic intensive care unit, were used to showcase this method in predicting mortality. For each data set, a balanced sample (n=8078 and n=21,940, respectively) and an imbalanced sample (n=12,117 and n=32,910, respectively) were drawn. A 2-parameter logistic model was used to provide scores for each case. Several ML algorithms were used in the demonstration to classify cases based on their health-related features: logistic regression, linear discriminant analysis, K-nearest neighbors, decision tree, naive Bayes, and a neural network. Generalized linear mixed model analyses were used to assess the effects of case difficulty strata, ML algorithm, and the interaction between them in predicting accuracy. RESULTS: The results showed significant effects (P<.001) for case difficulty strata, ML algorithm, and their interaction in predicting accuracy and illustrated that all classifiers performed better with easier-to-classify cases and that overall the neural network performed best. Significant interactions suggest that cases that fall in the most arduous strata should be handled by logistic regression, linear discriminant analysis, decision tree, or neural network but not by naive Bayes or K-nearest neighbors. Conventional metrics for ML classification have been reported for methodological comparison. CONCLUSIONS: This demonstration shows that using the IRT is a viable method for understanding the data that are provided to ML algorithms, independent of outcome measures, and highlights how well classifiers differentiate cases of varying difficulty. This method explains which features are indicative of healthy states and why. It enables end users to tailor the classifier that is appropriate to the difficulty level of the patient for personalized medicine. JMIR Publications 2020-09-25 /pmc/articles/PMC7547395/ /pubmed/32975523 http://dx.doi.org/10.2196/20268 Text en ©Adrienne Kline, Theresa Kline, Zahra Shakeri Hossein Abad, Joon Lee. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 25.09.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Kline, Adrienne
Kline, Theresa
Shakeri Hossein Abad, Zahra
Lee, Joon
Using Item Response Theory for Explainable Machine Learning in Predicting Mortality in the Intensive Care Unit: Case-Based Approach
title Using Item Response Theory for Explainable Machine Learning in Predicting Mortality in the Intensive Care Unit: Case-Based Approach
title_full Using Item Response Theory for Explainable Machine Learning in Predicting Mortality in the Intensive Care Unit: Case-Based Approach
title_fullStr Using Item Response Theory for Explainable Machine Learning in Predicting Mortality in the Intensive Care Unit: Case-Based Approach
title_full_unstemmed Using Item Response Theory for Explainable Machine Learning in Predicting Mortality in the Intensive Care Unit: Case-Based Approach
title_short Using Item Response Theory for Explainable Machine Learning in Predicting Mortality in the Intensive Care Unit: Case-Based Approach
title_sort using item response theory for explainable machine learning in predicting mortality in the intensive care unit: case-based approach
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7547395/
https://www.ncbi.nlm.nih.gov/pubmed/32975523
http://dx.doi.org/10.2196/20268
work_keys_str_mv AT klineadrienne usingitemresponsetheoryforexplainablemachinelearninginpredictingmortalityintheintensivecareunitcasebasedapproach
AT klinetheresa usingitemresponsetheoryforexplainablemachinelearninginpredictingmortalityintheintensivecareunitcasebasedapproach
AT shakerihosseinabadzahra usingitemresponsetheoryforexplainablemachinelearninginpredictingmortalityintheintensivecareunitcasebasedapproach
AT leejoon usingitemresponsetheoryforexplainablemachinelearninginpredictingmortalityintheintensivecareunitcasebasedapproach