Cargando…
EHR foundation models improve robustness in the presence of temporal distribution shift
Temporal distribution shift negatively impacts the performance of clinical prediction models over time. Pretraining foundation models using self-supervised learning on electronic health records (EHR) may be effective in acquiring informative global patterns that can improve the robustness of task-sp...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9992466/ https://www.ncbi.nlm.nih.gov/pubmed/36882576 http://dx.doi.org/10.1038/s41598-023-30820-8 |
_version_ | 1784902316129779712 |
---|---|
author | Guo, Lin Lawrence Steinberg, Ethan Fleming, Scott Lanyon Posada, Jose Lemmon, Joshua Pfohl, Stephen R. Shah, Nigam Fries, Jason Sung, Lillian |
author_facet | Guo, Lin Lawrence Steinberg, Ethan Fleming, Scott Lanyon Posada, Jose Lemmon, Joshua Pfohl, Stephen R. Shah, Nigam Fries, Jason Sung, Lillian |
author_sort | Guo, Lin Lawrence |
collection | PubMed |
description | Temporal distribution shift negatively impacts the performance of clinical prediction models over time. Pretraining foundation models using self-supervised learning on electronic health records (EHR) may be effective in acquiring informative global patterns that can improve the robustness of task-specific models. The objective was to evaluate the utility of EHR foundation models in improving the in-distribution (ID) and out-of-distribution (OOD) performance of clinical prediction models. Transformer- and gated recurrent unit-based foundation models were pretrained on EHR of up to 1.8 M patients (382 M coded events) collected within pre-determined year groups (e.g., 2009–2012) and were subsequently used to construct patient representations for patients admitted to inpatient units. These representations were used to train logistic regression models to predict hospital mortality, long length of stay, 30-day readmission, and ICU admission. We compared our EHR foundation models with baseline logistic regression models learned on count-based representations (count-LR) in ID and OOD year groups. Performance was measured using area-under-the-receiver-operating-characteristic curve (AUROC), area-under-the-precision-recall curve, and absolute calibration error. Both transformer and recurrent-based foundation models generally showed better ID and OOD discrimination relative to count-LR and often exhibited less decay in tasks where there is observable degradation of discrimination performance (average AUROC decay of 3% for transformer-based foundation model vs. 7% for count-LR after 5–9 years). In addition, the performance and robustness of transformer-based foundation models continued to improve as pretraining set size increased. These results suggest that pretraining EHR foundation models at scale is a useful approach for developing clinical prediction models that perform well in the presence of temporal distribution shift. |
format | Online Article Text |
id | pubmed-9992466 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-99924662023-03-09 EHR foundation models improve robustness in the presence of temporal distribution shift Guo, Lin Lawrence Steinberg, Ethan Fleming, Scott Lanyon Posada, Jose Lemmon, Joshua Pfohl, Stephen R. Shah, Nigam Fries, Jason Sung, Lillian Sci Rep Article Temporal distribution shift negatively impacts the performance of clinical prediction models over time. Pretraining foundation models using self-supervised learning on electronic health records (EHR) may be effective in acquiring informative global patterns that can improve the robustness of task-specific models. The objective was to evaluate the utility of EHR foundation models in improving the in-distribution (ID) and out-of-distribution (OOD) performance of clinical prediction models. Transformer- and gated recurrent unit-based foundation models were pretrained on EHR of up to 1.8 M patients (382 M coded events) collected within pre-determined year groups (e.g., 2009–2012) and were subsequently used to construct patient representations for patients admitted to inpatient units. These representations were used to train logistic regression models to predict hospital mortality, long length of stay, 30-day readmission, and ICU admission. We compared our EHR foundation models with baseline logistic regression models learned on count-based representations (count-LR) in ID and OOD year groups. Performance was measured using area-under-the-receiver-operating-characteristic curve (AUROC), area-under-the-precision-recall curve, and absolute calibration error. Both transformer and recurrent-based foundation models generally showed better ID and OOD discrimination relative to count-LR and often exhibited less decay in tasks where there is observable degradation of discrimination performance (average AUROC decay of 3% for transformer-based foundation model vs. 7% for count-LR after 5–9 years). In addition, the performance and robustness of transformer-based foundation models continued to improve as pretraining set size increased. These results suggest that pretraining EHR foundation models at scale is a useful approach for developing clinical prediction models that perform well in the presence of temporal distribution shift. Nature Publishing Group UK 2023-03-07 /pmc/articles/PMC9992466/ /pubmed/36882576 http://dx.doi.org/10.1038/s41598-023-30820-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Guo, Lin Lawrence Steinberg, Ethan Fleming, Scott Lanyon Posada, Jose Lemmon, Joshua Pfohl, Stephen R. Shah, Nigam Fries, Jason Sung, Lillian EHR foundation models improve robustness in the presence of temporal distribution shift |
title | EHR foundation models improve robustness in the presence of temporal distribution shift |
title_full | EHR foundation models improve robustness in the presence of temporal distribution shift |
title_fullStr | EHR foundation models improve robustness in the presence of temporal distribution shift |
title_full_unstemmed | EHR foundation models improve robustness in the presence of temporal distribution shift |
title_short | EHR foundation models improve robustness in the presence of temporal distribution shift |
title_sort | ehr foundation models improve robustness in the presence of temporal distribution shift |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9992466/ https://www.ncbi.nlm.nih.gov/pubmed/36882576 http://dx.doi.org/10.1038/s41598-023-30820-8 |
work_keys_str_mv | AT guolinlawrence ehrfoundationmodelsimproverobustnessinthepresenceoftemporaldistributionshift AT steinbergethan ehrfoundationmodelsimproverobustnessinthepresenceoftemporaldistributionshift AT flemingscottlanyon ehrfoundationmodelsimproverobustnessinthepresenceoftemporaldistributionshift AT posadajose ehrfoundationmodelsimproverobustnessinthepresenceoftemporaldistributionshift AT lemmonjoshua ehrfoundationmodelsimproverobustnessinthepresenceoftemporaldistributionshift AT pfohlstephenr ehrfoundationmodelsimproverobustnessinthepresenceoftemporaldistributionshift AT shahnigam ehrfoundationmodelsimproverobustnessinthepresenceoftemporaldistributionshift AT friesjason ehrfoundationmodelsimproverobustnessinthepresenceoftemporaldistributionshift AT sunglillian ehrfoundationmodelsimproverobustnessinthepresenceoftemporaldistributionshift |