Cargando…

Data heterogeneity in federated learning with Electronic Health Records: Case studies of risk prediction for acute kidney injury and sepsis diseases in critical care

With the wider availability of healthcare data such as Electronic Health Records (EHR), more and more data-driven based approaches have been proposed to improve the quality-of-care delivery. Predictive modeling, which aims at building computational models for predicting clinical risk, is a popular r...

Descripción completa

Detalles Bibliográficos
Autores principales: Rajendran, Suraj, Xu, Zhenxing, Pan, Weishen, Ghosh, Arnab, Wang, Fei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10016691/
https://www.ncbi.nlm.nih.gov/pubmed/36920974
http://dx.doi.org/10.1371/journal.pdig.0000117
_version_ 1784907456466386944
author Rajendran, Suraj
Xu, Zhenxing
Pan, Weishen
Ghosh, Arnab
Wang, Fei
author_facet Rajendran, Suraj
Xu, Zhenxing
Pan, Weishen
Ghosh, Arnab
Wang, Fei
author_sort Rajendran, Suraj
collection PubMed
description With the wider availability of healthcare data such as Electronic Health Records (EHR), more and more data-driven based approaches have been proposed to improve the quality-of-care delivery. Predictive modeling, which aims at building computational models for predicting clinical risk, is a popular research topic in healthcare analytics. However, concerns about privacy of healthcare data may hinder the development of effective predictive models that are generalizable because this often requires rich diverse data from multiple clinical institutions. Recently, federated learning (FL) has demonstrated promise in addressing this concern. However, data heterogeneity from different local participating sites may affect prediction performance of federated models. Due to acute kidney injury (AKI) and sepsis’ high prevalence among patients admitted to intensive care units (ICU), the early prediction of these conditions based on AI is an important topic in critical care medicine. In this study, we take AKI and sepsis onset risk prediction in ICU as two examples to explore the impact of data heterogeneity in the FL framework as well as compare performances across frameworks. We built predictive models based on local, pooled, and FL frameworks using EHR data across multiple hospitals. The local framework only used data from each site itself. The pooled framework combined data from all sites. In the FL framework, each local site did not have access to other sites’ data. A model was updated locally, and its parameters were shared to a central aggregator, which was used to update the federated model’s parameters and then subsequently, shared with each site. We found models built within a FL framework outperformed local counterparts. Then, we analyzed variable importance discrepancies across sites and frameworks. Finally, we explored potential sources of the heterogeneity within the EHR data. The different distributions of demographic profiles, medication use, and site information contributed to data heterogeneity.
format Online
Article
Text
id pubmed-10016691
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-100166912023-03-16 Data heterogeneity in federated learning with Electronic Health Records: Case studies of risk prediction for acute kidney injury and sepsis diseases in critical care Rajendran, Suraj Xu, Zhenxing Pan, Weishen Ghosh, Arnab Wang, Fei PLOS Digit Health Research Article With the wider availability of healthcare data such as Electronic Health Records (EHR), more and more data-driven based approaches have been proposed to improve the quality-of-care delivery. Predictive modeling, which aims at building computational models for predicting clinical risk, is a popular research topic in healthcare analytics. However, concerns about privacy of healthcare data may hinder the development of effective predictive models that are generalizable because this often requires rich diverse data from multiple clinical institutions. Recently, federated learning (FL) has demonstrated promise in addressing this concern. However, data heterogeneity from different local participating sites may affect prediction performance of federated models. Due to acute kidney injury (AKI) and sepsis’ high prevalence among patients admitted to intensive care units (ICU), the early prediction of these conditions based on AI is an important topic in critical care medicine. In this study, we take AKI and sepsis onset risk prediction in ICU as two examples to explore the impact of data heterogeneity in the FL framework as well as compare performances across frameworks. We built predictive models based on local, pooled, and FL frameworks using EHR data across multiple hospitals. The local framework only used data from each site itself. The pooled framework combined data from all sites. In the FL framework, each local site did not have access to other sites’ data. A model was updated locally, and its parameters were shared to a central aggregator, which was used to update the federated model’s parameters and then subsequently, shared with each site. We found models built within a FL framework outperformed local counterparts. Then, we analyzed variable importance discrepancies across sites and frameworks. Finally, we explored potential sources of the heterogeneity within the EHR data. The different distributions of demographic profiles, medication use, and site information contributed to data heterogeneity. Public Library of Science 2023-03-15 /pmc/articles/PMC10016691/ /pubmed/36920974 http://dx.doi.org/10.1371/journal.pdig.0000117 Text en © 2023 Rajendran et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Rajendran, Suraj
Xu, Zhenxing
Pan, Weishen
Ghosh, Arnab
Wang, Fei
Data heterogeneity in federated learning with Electronic Health Records: Case studies of risk prediction for acute kidney injury and sepsis diseases in critical care
title Data heterogeneity in federated learning with Electronic Health Records: Case studies of risk prediction for acute kidney injury and sepsis diseases in critical care
title_full Data heterogeneity in federated learning with Electronic Health Records: Case studies of risk prediction for acute kidney injury and sepsis diseases in critical care
title_fullStr Data heterogeneity in federated learning with Electronic Health Records: Case studies of risk prediction for acute kidney injury and sepsis diseases in critical care
title_full_unstemmed Data heterogeneity in federated learning with Electronic Health Records: Case studies of risk prediction for acute kidney injury and sepsis diseases in critical care
title_short Data heterogeneity in federated learning with Electronic Health Records: Case studies of risk prediction for acute kidney injury and sepsis diseases in critical care
title_sort data heterogeneity in federated learning with electronic health records: case studies of risk prediction for acute kidney injury and sepsis diseases in critical care
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10016691/
https://www.ncbi.nlm.nih.gov/pubmed/36920974
http://dx.doi.org/10.1371/journal.pdig.0000117
work_keys_str_mv AT rajendransuraj dataheterogeneityinfederatedlearningwithelectronichealthrecordscasestudiesofriskpredictionforacutekidneyinjuryandsepsisdiseasesincriticalcare
AT xuzhenxing dataheterogeneityinfederatedlearningwithelectronichealthrecordscasestudiesofriskpredictionforacutekidneyinjuryandsepsisdiseasesincriticalcare
AT panweishen dataheterogeneityinfederatedlearningwithelectronichealthrecordscasestudiesofriskpredictionforacutekidneyinjuryandsepsisdiseasesincriticalcare
AT ghosharnab dataheterogeneityinfederatedlearningwithelectronichealthrecordscasestudiesofriskpredictionforacutekidneyinjuryandsepsisdiseasesincriticalcare
AT wangfei dataheterogeneityinfederatedlearningwithelectronichealthrecordscasestudiesofriskpredictionforacutekidneyinjuryandsepsisdiseasesincriticalcare