Cargando…

Distributed learning for heterogeneous clinical data with application to integrating COVID-19 data across 230 sites

Integrating real-world data (RWD) from several clinical sites offers great opportunities to improve estimation with a more general population compared to analyses based on a single clinical site. However, sharing patient-level data across sites is practically challenging due to concerns about mainta...

Descripción completa

Detalles Bibliográficos
Autores principales: Tong, Jiayi, Luo, Chongliang, Islam, Md Nazmul, Sheils, Natalie E., Buresh, John, Edmondson, Mackenzie, Merkel, Peter A., Lautenbach, Ebbing, Duan, Rui, Chen, Yong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9198031/
https://www.ncbi.nlm.nih.gov/pubmed/35701668
http://dx.doi.org/10.1038/s41746-022-00615-8
_version_ 1784727533988610048
author Tong, Jiayi
Luo, Chongliang
Islam, Md Nazmul
Sheils, Natalie E.
Buresh, John
Edmondson, Mackenzie
Merkel, Peter A.
Lautenbach, Ebbing
Duan, Rui
Chen, Yong
author_facet Tong, Jiayi
Luo, Chongliang
Islam, Md Nazmul
Sheils, Natalie E.
Buresh, John
Edmondson, Mackenzie
Merkel, Peter A.
Lautenbach, Ebbing
Duan, Rui
Chen, Yong
author_sort Tong, Jiayi
collection PubMed
description Integrating real-world data (RWD) from several clinical sites offers great opportunities to improve estimation with a more general population compared to analyses based on a single clinical site. However, sharing patient-level data across sites is practically challenging due to concerns about maintaining patient privacy. We develop a distributed algorithm to integrate heterogeneous RWD from multiple clinical sites without sharing patient-level data. The proposed distributed conditional logistic regression (dCLR) algorithm can effectively account for between-site heterogeneity and requires only one round of communication. Our simulation study and data application with the data of 14,215 COVID-19 patients from 230 clinical sites in the UnitedHealth Group Clinical Research Database demonstrate that the proposed distributed algorithm provides an estimator that is robust to heterogeneity in event rates when efficiently integrating data from multiple clinical sites. Our algorithm is therefore a practical alternative to both meta-analysis and existing distributed algorithms for modeling heterogeneous multi-site binary outcomes.
format Online
Article
Text
id pubmed-9198031
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-91980312022-06-16 Distributed learning for heterogeneous clinical data with application to integrating COVID-19 data across 230 sites Tong, Jiayi Luo, Chongliang Islam, Md Nazmul Sheils, Natalie E. Buresh, John Edmondson, Mackenzie Merkel, Peter A. Lautenbach, Ebbing Duan, Rui Chen, Yong NPJ Digit Med Article Integrating real-world data (RWD) from several clinical sites offers great opportunities to improve estimation with a more general population compared to analyses based on a single clinical site. However, sharing patient-level data across sites is practically challenging due to concerns about maintaining patient privacy. We develop a distributed algorithm to integrate heterogeneous RWD from multiple clinical sites without sharing patient-level data. The proposed distributed conditional logistic regression (dCLR) algorithm can effectively account for between-site heterogeneity and requires only one round of communication. Our simulation study and data application with the data of 14,215 COVID-19 patients from 230 clinical sites in the UnitedHealth Group Clinical Research Database demonstrate that the proposed distributed algorithm provides an estimator that is robust to heterogeneity in event rates when efficiently integrating data from multiple clinical sites. Our algorithm is therefore a practical alternative to both meta-analysis and existing distributed algorithms for modeling heterogeneous multi-site binary outcomes. Nature Publishing Group UK 2022-06-14 /pmc/articles/PMC9198031/ /pubmed/35701668 http://dx.doi.org/10.1038/s41746-022-00615-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Tong, Jiayi
Luo, Chongliang
Islam, Md Nazmul
Sheils, Natalie E.
Buresh, John
Edmondson, Mackenzie
Merkel, Peter A.
Lautenbach, Ebbing
Duan, Rui
Chen, Yong
Distributed learning for heterogeneous clinical data with application to integrating COVID-19 data across 230 sites
title Distributed learning for heterogeneous clinical data with application to integrating COVID-19 data across 230 sites
title_full Distributed learning for heterogeneous clinical data with application to integrating COVID-19 data across 230 sites
title_fullStr Distributed learning for heterogeneous clinical data with application to integrating COVID-19 data across 230 sites
title_full_unstemmed Distributed learning for heterogeneous clinical data with application to integrating COVID-19 data across 230 sites
title_short Distributed learning for heterogeneous clinical data with application to integrating COVID-19 data across 230 sites
title_sort distributed learning for heterogeneous clinical data with application to integrating covid-19 data across 230 sites
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9198031/
https://www.ncbi.nlm.nih.gov/pubmed/35701668
http://dx.doi.org/10.1038/s41746-022-00615-8
work_keys_str_mv AT tongjiayi distributedlearningforheterogeneousclinicaldatawithapplicationtointegratingcovid19dataacross230sites
AT luochongliang distributedlearningforheterogeneousclinicaldatawithapplicationtointegratingcovid19dataacross230sites
AT islammdnazmul distributedlearningforheterogeneousclinicaldatawithapplicationtointegratingcovid19dataacross230sites
AT sheilsnataliee distributedlearningforheterogeneousclinicaldatawithapplicationtointegratingcovid19dataacross230sites
AT bureshjohn distributedlearningforheterogeneousclinicaldatawithapplicationtointegratingcovid19dataacross230sites
AT edmondsonmackenzie distributedlearningforheterogeneousclinicaldatawithapplicationtointegratingcovid19dataacross230sites
AT merkelpetera distributedlearningforheterogeneousclinicaldatawithapplicationtointegratingcovid19dataacross230sites
AT lautenbachebbing distributedlearningforheterogeneousclinicaldatawithapplicationtointegratingcovid19dataacross230sites
AT duanrui distributedlearningforheterogeneousclinicaldatawithapplicationtointegratingcovid19dataacross230sites
AT chenyong distributedlearningforheterogeneousclinicaldatawithapplicationtointegratingcovid19dataacross230sites