Cargando…

The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data

BACKGROUND: Computationally derived (“synthetic”) data can enable the creation and analysis of clinical, laboratory, and diagnostic data as if they were the original electronic health record data. Synthetic data can support data sharing to answer critical research questions to address the COVID-19 p...

Descripción completa

Detalles Bibliográficos
Autores principales: Foraker, Randi, Guo, Aixia, Thomas, Jason, Zamstein, Noa, Payne, Philip RO, Wilcox, Adam
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8491642/
https://www.ncbi.nlm.nih.gov/pubmed/34559671
http://dx.doi.org/10.2196/30697
_version_ 1784578769316478976
author Foraker, Randi
Guo, Aixia
Thomas, Jason
Zamstein, Noa
Payne, Philip RO
Wilcox, Adam
author_facet Foraker, Randi
Guo, Aixia
Thomas, Jason
Zamstein, Noa
Payne, Philip RO
Wilcox, Adam
author_sort Foraker, Randi
collection PubMed
description BACKGROUND: Computationally derived (“synthetic”) data can enable the creation and analysis of clinical, laboratory, and diagnostic data as if they were the original electronic health record data. Synthetic data can support data sharing to answer critical research questions to address the COVID-19 pandemic. OBJECTIVE: We aim to compare the results from analyses of synthetic data to those from original data and assess the strengths and limitations of leveraging computationally derived data for research purposes. METHODS: We used the National COVID Cohort Collaborative’s instance of MDClone, a big data platform with data-synthesizing capabilities (MDClone Ltd). We downloaded electronic health record data from 34 National COVID Cohort Collaborative institutional partners and tested three use cases, including (1) exploring the distributions of key features of the COVID-19–positive cohort; (2) training and testing predictive models for assessing the risk of admission among these patients; and (3) determining geospatial and temporal COVID-19–related measures and outcomes, and constructing their epidemic curves. We compared the results from synthetic data to those from original data using traditional statistics, machine learning approaches, and temporal and spatial representations of the data. RESULTS: For each use case, the results of the synthetic data analyses successfully mimicked those of the original data such that the distributions of the data were similar and the predictive models demonstrated comparable performance. Although the synthetic and original data yielded overall nearly the same results, there were exceptions that included an odds ratio on either side of the null in multivariable analyses (0.97 vs 1.01) and differences in the magnitude of epidemic curves constructed for zip codes with low population counts. CONCLUSIONS: This paper presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in collaborative research for faster insights.
format Online
Article
Text
id pubmed-8491642
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-84916422021-12-07 The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data Foraker, Randi Guo, Aixia Thomas, Jason Zamstein, Noa Payne, Philip RO Wilcox, Adam J Med Internet Res Original Paper BACKGROUND: Computationally derived (“synthetic”) data can enable the creation and analysis of clinical, laboratory, and diagnostic data as if they were the original electronic health record data. Synthetic data can support data sharing to answer critical research questions to address the COVID-19 pandemic. OBJECTIVE: We aim to compare the results from analyses of synthetic data to those from original data and assess the strengths and limitations of leveraging computationally derived data for research purposes. METHODS: We used the National COVID Cohort Collaborative’s instance of MDClone, a big data platform with data-synthesizing capabilities (MDClone Ltd). We downloaded electronic health record data from 34 National COVID Cohort Collaborative institutional partners and tested three use cases, including (1) exploring the distributions of key features of the COVID-19–positive cohort; (2) training and testing predictive models for assessing the risk of admission among these patients; and (3) determining geospatial and temporal COVID-19–related measures and outcomes, and constructing their epidemic curves. We compared the results from synthetic data to those from original data using traditional statistics, machine learning approaches, and temporal and spatial representations of the data. RESULTS: For each use case, the results of the synthetic data analyses successfully mimicked those of the original data such that the distributions of the data were similar and the predictive models demonstrated comparable performance. Although the synthetic and original data yielded overall nearly the same results, there were exceptions that included an odds ratio on either side of the null in multivariable analyses (0.97 vs 1.01) and differences in the magnitude of epidemic curves constructed for zip codes with low population counts. CONCLUSIONS: This paper presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in collaborative research for faster insights. JMIR Publications 2021-10-04 /pmc/articles/PMC8491642/ /pubmed/34559671 http://dx.doi.org/10.2196/30697 Text en ©Randi Foraker, Aixia Guo, Jason Thomas, Noa Zamstein, Philip RO Payne, Adam Wilcox, N3C Collaborative. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 04.10.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Foraker, Randi
Guo, Aixia
Thomas, Jason
Zamstein, Noa
Payne, Philip RO
Wilcox, Adam
The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data
title The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data
title_full The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data
title_fullStr The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data
title_full_unstemmed The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data
title_short The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data
title_sort national covid cohort collaborative: analyses of original and computationally derived electronic health record data
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8491642/
https://www.ncbi.nlm.nih.gov/pubmed/34559671
http://dx.doi.org/10.2196/30697
work_keys_str_mv AT forakerrandi thenationalcovidcohortcollaborativeanalysesoforiginalandcomputationallyderivedelectronichealthrecorddata
AT guoaixia thenationalcovidcohortcollaborativeanalysesoforiginalandcomputationallyderivedelectronichealthrecorddata
AT thomasjason thenationalcovidcohortcollaborativeanalysesoforiginalandcomputationallyderivedelectronichealthrecorddata
AT zamsteinnoa thenationalcovidcohortcollaborativeanalysesoforiginalandcomputationallyderivedelectronichealthrecorddata
AT paynephilipro thenationalcovidcohortcollaborativeanalysesoforiginalandcomputationallyderivedelectronichealthrecorddata
AT wilcoxadam thenationalcovidcohortcollaborativeanalysesoforiginalandcomputationallyderivedelectronichealthrecorddata
AT thenationalcovidcohortcollaborativeanalysesoforiginalandcomputationallyderivedelectronichealthrecorddata
AT forakerrandi nationalcovidcohortcollaborativeanalysesoforiginalandcomputationallyderivedelectronichealthrecorddata
AT guoaixia nationalcovidcohortcollaborativeanalysesoforiginalandcomputationallyderivedelectronichealthrecorddata
AT thomasjason nationalcovidcohortcollaborativeanalysesoforiginalandcomputationallyderivedelectronichealthrecorddata
AT zamsteinnoa nationalcovidcohortcollaborativeanalysesoforiginalandcomputationallyderivedelectronichealthrecorddata
AT paynephilipro nationalcovidcohortcollaborativeanalysesoforiginalandcomputationallyderivedelectronichealthrecorddata
AT wilcoxadam nationalcovidcohortcollaborativeanalysesoforiginalandcomputationallyderivedelectronichealthrecorddata
AT nationalcovidcohortcollaborativeanalysesoforiginalandcomputationallyderivedelectronichealthrecorddata