Cargando…

Assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets: experiences from predicting outcomes of Danish trauma patients

INTRODUCTION: Accurately predicting patient outcomes is crucial for improving healthcare delivery, but large-scale risk prediction models are often developed and tested on specific datasets where clinical parameters and outcomes may not fully reflect local clinical settings. Where this is the case,...

Descripción completa

Detalles Bibliográficos
Autores principales:	Millarch, Andreas Skov, Bonde, Alexander, Bonde, Mikkel, Klein, Kiril Vadomovic, Folke, Fredrik, Rudolph, Søren Steemann, Sillesen, Martin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Digital Health
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10656776/ https://www.ncbi.nlm.nih.gov/pubmed/38026835 http://dx.doi.org/10.3389/fdgth.2023.1249258

_version_	1785137073638866944
author	Millarch, Andreas Skov Bonde, Alexander Bonde, Mikkel Klein, Kiril Vadomovic Folke, Fredrik Rudolph, Søren Steemann Sillesen, Martin
author_facet	Millarch, Andreas Skov Bonde, Alexander Bonde, Mikkel Klein, Kiril Vadomovic Folke, Fredrik Rudolph, Søren Steemann Sillesen, Martin
author_sort	Millarch, Andreas Skov
collection	PubMed
description	INTRODUCTION: Accurately predicting patient outcomes is crucial for improving healthcare delivery, but large-scale risk prediction models are often developed and tested on specific datasets where clinical parameters and outcomes may not fully reflect local clinical settings. Where this is the case, whether to opt for de-novo training of prediction models on local datasets, direct porting of externally trained models, or a transfer learning approach is not well studied, and constitutes the focus of this study. Using the clinical challenge of predicting mortality and hospital length of stay on a Danish trauma dataset, we hypothesized that a transfer learning approach of models trained on large external datasets would provide optimal prediction results compared to de-novo training on sparse but local datasets or directly porting externally trained models. METHODS: Using an external dataset of trauma patients from the US Trauma Quality Improvement Program (TQIP) and a local dataset aggregated from the Danish Trauma Database (DTD) enriched with Electronic Health Record data, we tested a range of model-level approaches focused on predicting trauma mortality and hospital length of stay on DTD data. Modeling approaches included de-novo training of models on DTD data, direct porting of models trained on TQIP data to the DTD, and a transfer learning approach by training a model on TQIP data with subsequent transfer and retraining on DTD data. Furthermore, data-level approaches, including mixed dataset training and methods countering imbalanced outcomes (e.g., low mortality rates), were also tested. RESULTS: Using a neural network trained on a mixed dataset consisting of a subset of TQIP and DTD, with class weighting and transfer learning (retraining on DTD), we achieved excellent results in predicting mortality, with a ROC-AUC of 0.988 and an F2-score of 0.866. The best-performing models for predicting long-term hospitalization were trained only on local data, achieving an ROC-AUC of 0.890 and an F1-score of 0.897, although only marginally better than alternative approaches. CONCLUSION: Our results suggest that when assessing the optimal modeling approach, it is important to have domain knowledge of how incidence rates and workflows compare between hospital systems and datasets where models are trained. Including data from other health-care systems is particularly beneficial when outcomes are suffering from class imbalance and low incidence. Scenarios where outcomes are not directly comparable are best addressed through either de-novo local training or a transfer learning approach.
format	Online Article Text
id	pubmed-10656776
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-106567762023-11-02 Assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets: experiences from predicting outcomes of Danish trauma patients Millarch, Andreas Skov Bonde, Alexander Bonde, Mikkel Klein, Kiril Vadomovic Folke, Fredrik Rudolph, Søren Steemann Sillesen, Martin Front Digit Health Digital Health INTRODUCTION: Accurately predicting patient outcomes is crucial for improving healthcare delivery, but large-scale risk prediction models are often developed and tested on specific datasets where clinical parameters and outcomes may not fully reflect local clinical settings. Where this is the case, whether to opt for de-novo training of prediction models on local datasets, direct porting of externally trained models, or a transfer learning approach is not well studied, and constitutes the focus of this study. Using the clinical challenge of predicting mortality and hospital length of stay on a Danish trauma dataset, we hypothesized that a transfer learning approach of models trained on large external datasets would provide optimal prediction results compared to de-novo training on sparse but local datasets or directly porting externally trained models. METHODS: Using an external dataset of trauma patients from the US Trauma Quality Improvement Program (TQIP) and a local dataset aggregated from the Danish Trauma Database (DTD) enriched with Electronic Health Record data, we tested a range of model-level approaches focused on predicting trauma mortality and hospital length of stay on DTD data. Modeling approaches included de-novo training of models on DTD data, direct porting of models trained on TQIP data to the DTD, and a transfer learning approach by training a model on TQIP data with subsequent transfer and retraining on DTD data. Furthermore, data-level approaches, including mixed dataset training and methods countering imbalanced outcomes (e.g., low mortality rates), were also tested. RESULTS: Using a neural network trained on a mixed dataset consisting of a subset of TQIP and DTD, with class weighting and transfer learning (retraining on DTD), we achieved excellent results in predicting mortality, with a ROC-AUC of 0.988 and an F2-score of 0.866. The best-performing models for predicting long-term hospitalization were trained only on local data, achieving an ROC-AUC of 0.890 and an F1-score of 0.897, although only marginally better than alternative approaches. CONCLUSION: Our results suggest that when assessing the optimal modeling approach, it is important to have domain knowledge of how incidence rates and workflows compare between hospital systems and datasets where models are trained. Including data from other health-care systems is particularly beneficial when outcomes are suffering from class imbalance and low incidence. Scenarios where outcomes are not directly comparable are best addressed through either de-novo local training or a transfer learning approach. Frontiers Media S.A. 2023-11-02 /pmc/articles/PMC10656776/ /pubmed/38026835 http://dx.doi.org/10.3389/fdgth.2023.1249258 Text en © 2023 Millarch, Bonde, Bonde, Klein, Folke, Rudolph and Sillesen. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) (https://creativecommons.org/licenses/by/4.0/) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Digital Health Millarch, Andreas Skov Bonde, Alexander Bonde, Mikkel Klein, Kiril Vadomovic Folke, Fredrik Rudolph, Søren Steemann Sillesen, Martin Assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets: experiences from predicting outcomes of Danish trauma patients
title	Assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets: experiences from predicting outcomes of Danish trauma patients
title_full	Assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets: experiences from predicting outcomes of Danish trauma patients
title_fullStr	Assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets: experiences from predicting outcomes of Danish trauma patients
title_full_unstemmed	Assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets: experiences from predicting outcomes of Danish trauma patients
title_short	Assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets: experiences from predicting outcomes of Danish trauma patients
title_sort	assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets: experiences from predicting outcomes of danish trauma patients
topic	Digital Health
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10656776/ https://www.ncbi.nlm.nih.gov/pubmed/38026835 http://dx.doi.org/10.3389/fdgth.2023.1249258
work_keys_str_mv	AT millarchandreasskov assessingoptimalmethodsfortransferringmachinelearningmodelstolowvolumeandimbalancedclinicaldatasetsexperiencesfrompredictingoutcomesofdanishtraumapatients AT bondealexander assessingoptimalmethodsfortransferringmachinelearningmodelstolowvolumeandimbalancedclinicaldatasetsexperiencesfrompredictingoutcomesofdanishtraumapatients AT bondemikkel assessingoptimalmethodsfortransferringmachinelearningmodelstolowvolumeandimbalancedclinicaldatasetsexperiencesfrompredictingoutcomesofdanishtraumapatients AT kleinkirilvadomovic assessingoptimalmethodsfortransferringmachinelearningmodelstolowvolumeandimbalancedclinicaldatasetsexperiencesfrompredictingoutcomesofdanishtraumapatients AT folkefredrik assessingoptimalmethodsfortransferringmachinelearningmodelstolowvolumeandimbalancedclinicaldatasetsexperiencesfrompredictingoutcomesofdanishtraumapatients AT rudolphsørensteemann assessingoptimalmethodsfortransferringmachinelearningmodelstolowvolumeandimbalancedclinicaldatasetsexperiencesfrompredictingoutcomesofdanishtraumapatients AT sillesenmartin assessingoptimalmethodsfortransferringmachinelearningmodelstolowvolumeandimbalancedclinicaldatasetsexperiencesfrompredictingoutcomesofdanishtraumapatients

Assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets: experiences from predicting outcomes of Danish trauma patients

Ejemplares similares