Cargando…

Machine Learning–Based Prediction Models for Different Clinical Risks in Different Hospitals: Evaluation of Live Performance

BACKGROUND: Machine learning algorithms are currently used in a wide array of clinical domains to produce models that can predict clinical risk events. Most models are developed and evaluated with retrospective data, very few are evaluated in a clinical workflow, and even fewer report performances i...

Descripción completa

Detalles Bibliográficos
Autores principales: Sun, Hong, Depraetere, Kristof, Meesseman, Laurent, Cabanillas Silva, Patricia, Szymanowsky, Ralph, Fliegenschmidt, Janis, Hulde, Nikolai, von Dossow, Vera, Vanbiervliet, Martijn, De Baerdemaeker, Jos, Roccaro-Waldmeyer, Diana M, Stieg, Jörg, Domínguez Hidalgo, Manuel, Dahlweid, Fried-Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9214618/
https://www.ncbi.nlm.nih.gov/pubmed/35502887
http://dx.doi.org/10.2196/34295
_version_ 1784731058326994944
author Sun, Hong
Depraetere, Kristof
Meesseman, Laurent
Cabanillas Silva, Patricia
Szymanowsky, Ralph
Fliegenschmidt, Janis
Hulde, Nikolai
von Dossow, Vera
Vanbiervliet, Martijn
De Baerdemaeker, Jos
Roccaro-Waldmeyer, Diana M
Stieg, Jörg
Domínguez Hidalgo, Manuel
Dahlweid, Fried-Michael
author_facet Sun, Hong
Depraetere, Kristof
Meesseman, Laurent
Cabanillas Silva, Patricia
Szymanowsky, Ralph
Fliegenschmidt, Janis
Hulde, Nikolai
von Dossow, Vera
Vanbiervliet, Martijn
De Baerdemaeker, Jos
Roccaro-Waldmeyer, Diana M
Stieg, Jörg
Domínguez Hidalgo, Manuel
Dahlweid, Fried-Michael
author_sort Sun, Hong
collection PubMed
description BACKGROUND: Machine learning algorithms are currently used in a wide array of clinical domains to produce models that can predict clinical risk events. Most models are developed and evaluated with retrospective data, very few are evaluated in a clinical workflow, and even fewer report performances in different hospitals. In this study, we provide detailed evaluations of clinical risk prediction models in live clinical workflows for three different use cases in three different hospitals. OBJECTIVE: The main objective of this study was to evaluate clinical risk prediction models in live clinical workflows and compare their performance in these setting with their performance when using retrospective data. We also aimed at generalizing the results by applying our investigation to three different use cases in three different hospitals. METHODS: We trained clinical risk prediction models for three use cases (ie, delirium, sepsis, and acute kidney injury) in three different hospitals with retrospective data. We used machine learning and, specifically, deep learning to train models that were based on the Transformer model. The models were trained using a calibration tool that is common for all hospitals and use cases. The models had a common design but were calibrated using each hospital’s specific data. The models were deployed in these three hospitals and used in daily clinical practice. The predictions made by these models were logged and correlated with the diagnosis at discharge. We compared their performance with evaluations on retrospective data and conducted cross-hospital evaluations. RESULTS: The performance of the prediction models with data from live clinical workflows was similar to the performance with retrospective data. The average value of the area under the receiver operating characteristic curve (AUROC) decreased slightly by 0.6 percentage points (from 94.8% to 94.2% at discharge). The cross-hospital evaluations exhibited severely reduced performance: the average AUROC decreased by 8 percentage points (from 94.2% to 86.3% at discharge), which indicates the importance of model calibration with data from the deployment hospital. CONCLUSIONS: Calibrating the prediction model with data from different deployment hospitals led to good performance in live settings. The performance degradation in the cross-hospital evaluation identified limitations in developing a generic model for different hospitals. Designing a generic process for model development to generate specialized prediction models for each hospital guarantees model performance in different hospitals.
format Online
Article
Text
id pubmed-9214618
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-92146182022-06-23 Machine Learning–Based Prediction Models for Different Clinical Risks in Different Hospitals: Evaluation of Live Performance Sun, Hong Depraetere, Kristof Meesseman, Laurent Cabanillas Silva, Patricia Szymanowsky, Ralph Fliegenschmidt, Janis Hulde, Nikolai von Dossow, Vera Vanbiervliet, Martijn De Baerdemaeker, Jos Roccaro-Waldmeyer, Diana M Stieg, Jörg Domínguez Hidalgo, Manuel Dahlweid, Fried-Michael J Med Internet Res Original Paper BACKGROUND: Machine learning algorithms are currently used in a wide array of clinical domains to produce models that can predict clinical risk events. Most models are developed and evaluated with retrospective data, very few are evaluated in a clinical workflow, and even fewer report performances in different hospitals. In this study, we provide detailed evaluations of clinical risk prediction models in live clinical workflows for three different use cases in three different hospitals. OBJECTIVE: The main objective of this study was to evaluate clinical risk prediction models in live clinical workflows and compare their performance in these setting with their performance when using retrospective data. We also aimed at generalizing the results by applying our investigation to three different use cases in three different hospitals. METHODS: We trained clinical risk prediction models for three use cases (ie, delirium, sepsis, and acute kidney injury) in three different hospitals with retrospective data. We used machine learning and, specifically, deep learning to train models that were based on the Transformer model. The models were trained using a calibration tool that is common for all hospitals and use cases. The models had a common design but were calibrated using each hospital’s specific data. The models were deployed in these three hospitals and used in daily clinical practice. The predictions made by these models were logged and correlated with the diagnosis at discharge. We compared their performance with evaluations on retrospective data and conducted cross-hospital evaluations. RESULTS: The performance of the prediction models with data from live clinical workflows was similar to the performance with retrospective data. The average value of the area under the receiver operating characteristic curve (AUROC) decreased slightly by 0.6 percentage points (from 94.8% to 94.2% at discharge). The cross-hospital evaluations exhibited severely reduced performance: the average AUROC decreased by 8 percentage points (from 94.2% to 86.3% at discharge), which indicates the importance of model calibration with data from the deployment hospital. CONCLUSIONS: Calibrating the prediction model with data from different deployment hospitals led to good performance in live settings. The performance degradation in the cross-hospital evaluation identified limitations in developing a generic model for different hospitals. Designing a generic process for model development to generate specialized prediction models for each hospital guarantees model performance in different hospitals. JMIR Publications 2022-06-07 /pmc/articles/PMC9214618/ /pubmed/35502887 http://dx.doi.org/10.2196/34295 Text en ©Hong Sun, Kristof Depraetere, Laurent Meesseman, Patricia Cabanillas Silva, Ralph Szymanowsky, Janis Fliegenschmidt, Nikolai Hulde, Vera von Dossow, Martijn Vanbiervliet, Jos De Baerdemaeker, Diana M Roccaro-Waldmeyer, Jörg Stieg, Manuel Domínguez Hidalgo, Fried-Michael Dahlweid. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 07.06.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Sun, Hong
Depraetere, Kristof
Meesseman, Laurent
Cabanillas Silva, Patricia
Szymanowsky, Ralph
Fliegenschmidt, Janis
Hulde, Nikolai
von Dossow, Vera
Vanbiervliet, Martijn
De Baerdemaeker, Jos
Roccaro-Waldmeyer, Diana M
Stieg, Jörg
Domínguez Hidalgo, Manuel
Dahlweid, Fried-Michael
Machine Learning–Based Prediction Models for Different Clinical Risks in Different Hospitals: Evaluation of Live Performance
title Machine Learning–Based Prediction Models for Different Clinical Risks in Different Hospitals: Evaluation of Live Performance
title_full Machine Learning–Based Prediction Models for Different Clinical Risks in Different Hospitals: Evaluation of Live Performance
title_fullStr Machine Learning–Based Prediction Models for Different Clinical Risks in Different Hospitals: Evaluation of Live Performance
title_full_unstemmed Machine Learning–Based Prediction Models for Different Clinical Risks in Different Hospitals: Evaluation of Live Performance
title_short Machine Learning–Based Prediction Models for Different Clinical Risks in Different Hospitals: Evaluation of Live Performance
title_sort machine learning–based prediction models for different clinical risks in different hospitals: evaluation of live performance
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9214618/
https://www.ncbi.nlm.nih.gov/pubmed/35502887
http://dx.doi.org/10.2196/34295
work_keys_str_mv AT sunhong machinelearningbasedpredictionmodelsfordifferentclinicalrisksindifferenthospitalsevaluationofliveperformance
AT depraeterekristof machinelearningbasedpredictionmodelsfordifferentclinicalrisksindifferenthospitalsevaluationofliveperformance
AT meessemanlaurent machinelearningbasedpredictionmodelsfordifferentclinicalrisksindifferenthospitalsevaluationofliveperformance
AT cabanillassilvapatricia machinelearningbasedpredictionmodelsfordifferentclinicalrisksindifferenthospitalsevaluationofliveperformance
AT szymanowskyralph machinelearningbasedpredictionmodelsfordifferentclinicalrisksindifferenthospitalsevaluationofliveperformance
AT fliegenschmidtjanis machinelearningbasedpredictionmodelsfordifferentclinicalrisksindifferenthospitalsevaluationofliveperformance
AT huldenikolai machinelearningbasedpredictionmodelsfordifferentclinicalrisksindifferenthospitalsevaluationofliveperformance
AT vondossowvera machinelearningbasedpredictionmodelsfordifferentclinicalrisksindifferenthospitalsevaluationofliveperformance
AT vanbiervlietmartijn machinelearningbasedpredictionmodelsfordifferentclinicalrisksindifferenthospitalsevaluationofliveperformance
AT debaerdemaekerjos machinelearningbasedpredictionmodelsfordifferentclinicalrisksindifferenthospitalsevaluationofliveperformance
AT roccarowaldmeyerdianam machinelearningbasedpredictionmodelsfordifferentclinicalrisksindifferenthospitalsevaluationofliveperformance
AT stiegjorg machinelearningbasedpredictionmodelsfordifferentclinicalrisksindifferenthospitalsevaluationofliveperformance
AT dominguezhidalgomanuel machinelearningbasedpredictionmodelsfordifferentclinicalrisksindifferenthospitalsevaluationofliveperformance
AT dahlweidfriedmichael machinelearningbasedpredictionmodelsfordifferentclinicalrisksindifferenthospitalsevaluationofliveperformance