Cargando…

Spot the difference: comparing results of analyses from real patient data and synthetic derivatives

BACKGROUND: Synthetic data may provide a solution to researchers who wish to generate and share data in support of precision healthcare. Recent advances in data synthesis enable the creation and analysis of synthetic derivatives as if they were the original data; this process has significant advanta...

Descripción completa

Detalles Bibliográficos
Autores principales: Foraker, Randi E, Yu, Sean C, Gupta, Aditi, Michelson, Andrew P, Pineda Soto, Jose A, Colvin, Ryan, Loh, Francis, Kollef, Marin H, Maddox, Thomas, Evanoff, Bradley, Dror, Hovav, Zamstein, Noa, Lai, Albert M, Payne, Philip R O
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7886551/
https://www.ncbi.nlm.nih.gov/pubmed/33623891
http://dx.doi.org/10.1093/jamiaopen/ooaa060
_version_ 1783651819492212736
author Foraker, Randi E
Yu, Sean C
Gupta, Aditi
Michelson, Andrew P
Pineda Soto, Jose A
Colvin, Ryan
Loh, Francis
Kollef, Marin H
Maddox, Thomas
Evanoff, Bradley
Dror, Hovav
Zamstein, Noa
Lai, Albert M
Payne, Philip R O
author_facet Foraker, Randi E
Yu, Sean C
Gupta, Aditi
Michelson, Andrew P
Pineda Soto, Jose A
Colvin, Ryan
Loh, Francis
Kollef, Marin H
Maddox, Thomas
Evanoff, Bradley
Dror, Hovav
Zamstein, Noa
Lai, Albert M
Payne, Philip R O
author_sort Foraker, Randi E
collection PubMed
description BACKGROUND: Synthetic data may provide a solution to researchers who wish to generate and share data in support of precision healthcare. Recent advances in data synthesis enable the creation and analysis of synthetic derivatives as if they were the original data; this process has significant advantages over data deidentification. OBJECTIVES: To assess a big-data platform with data-synthesizing capabilities (MDClone Ltd., Beer Sheva, Israel) for its ability to produce data that can be used for research purposes while obviating privacy and confidentiality concerns. METHODS: We explored three use cases and tested the robustness of synthetic data by comparing the results of analyses using synthetic derivatives to analyses using the original data using traditional statistics, machine learning approaches, and spatial representations of the data. We designed these use cases with the purpose of conducting analyses at the observation level (Use Case 1), patient cohorts (Use Case 2), and population-level data (Use Case 3). RESULTS: For each use case, the results of the analyses were sufficiently statistically similar (P > 0.05) between the synthetic derivative and the real data to draw the same conclusions. DISCUSSION AND CONCLUSION: This article presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in clinical research for faster insights and improved data sharing in support of precision healthcare.
format Online
Article
Text
id pubmed-7886551
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-78865512021-02-22 Spot the difference: comparing results of analyses from real patient data and synthetic derivatives Foraker, Randi E Yu, Sean C Gupta, Aditi Michelson, Andrew P Pineda Soto, Jose A Colvin, Ryan Loh, Francis Kollef, Marin H Maddox, Thomas Evanoff, Bradley Dror, Hovav Zamstein, Noa Lai, Albert M Payne, Philip R O JAMIA Open Research and Applications BACKGROUND: Synthetic data may provide a solution to researchers who wish to generate and share data in support of precision healthcare. Recent advances in data synthesis enable the creation and analysis of synthetic derivatives as if they were the original data; this process has significant advantages over data deidentification. OBJECTIVES: To assess a big-data platform with data-synthesizing capabilities (MDClone Ltd., Beer Sheva, Israel) for its ability to produce data that can be used for research purposes while obviating privacy and confidentiality concerns. METHODS: We explored three use cases and tested the robustness of synthetic data by comparing the results of analyses using synthetic derivatives to analyses using the original data using traditional statistics, machine learning approaches, and spatial representations of the data. We designed these use cases with the purpose of conducting analyses at the observation level (Use Case 1), patient cohorts (Use Case 2), and population-level data (Use Case 3). RESULTS: For each use case, the results of the analyses were sufficiently statistically similar (P > 0.05) between the synthetic derivative and the real data to draw the same conclusions. DISCUSSION AND CONCLUSION: This article presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in clinical research for faster insights and improved data sharing in support of precision healthcare. Oxford University Press 2020-12-14 /pmc/articles/PMC7886551/ /pubmed/33623891 http://dx.doi.org/10.1093/jamiaopen/ooaa060 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research and Applications
Foraker, Randi E
Yu, Sean C
Gupta, Aditi
Michelson, Andrew P
Pineda Soto, Jose A
Colvin, Ryan
Loh, Francis
Kollef, Marin H
Maddox, Thomas
Evanoff, Bradley
Dror, Hovav
Zamstein, Noa
Lai, Albert M
Payne, Philip R O
Spot the difference: comparing results of analyses from real patient data and synthetic derivatives
title Spot the difference: comparing results of analyses from real patient data and synthetic derivatives
title_full Spot the difference: comparing results of analyses from real patient data and synthetic derivatives
title_fullStr Spot the difference: comparing results of analyses from real patient data and synthetic derivatives
title_full_unstemmed Spot the difference: comparing results of analyses from real patient data and synthetic derivatives
title_short Spot the difference: comparing results of analyses from real patient data and synthetic derivatives
title_sort spot the difference: comparing results of analyses from real patient data and synthetic derivatives
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7886551/
https://www.ncbi.nlm.nih.gov/pubmed/33623891
http://dx.doi.org/10.1093/jamiaopen/ooaa060
work_keys_str_mv AT forakerrandie spotthedifferencecomparingresultsofanalysesfromrealpatientdataandsyntheticderivatives
AT yuseanc spotthedifferencecomparingresultsofanalysesfromrealpatientdataandsyntheticderivatives
AT guptaaditi spotthedifferencecomparingresultsofanalysesfromrealpatientdataandsyntheticderivatives
AT michelsonandrewp spotthedifferencecomparingresultsofanalysesfromrealpatientdataandsyntheticderivatives
AT pinedasotojosea spotthedifferencecomparingresultsofanalysesfromrealpatientdataandsyntheticderivatives
AT colvinryan spotthedifferencecomparingresultsofanalysesfromrealpatientdataandsyntheticderivatives
AT lohfrancis spotthedifferencecomparingresultsofanalysesfromrealpatientdataandsyntheticderivatives
AT kollefmarinh spotthedifferencecomparingresultsofanalysesfromrealpatientdataandsyntheticderivatives
AT maddoxthomas spotthedifferencecomparingresultsofanalysesfromrealpatientdataandsyntheticderivatives
AT evanoffbradley spotthedifferencecomparingresultsofanalysesfromrealpatientdataandsyntheticderivatives
AT drorhovav spotthedifferencecomparingresultsofanalysesfromrealpatientdataandsyntheticderivatives
AT zamsteinnoa spotthedifferencecomparingresultsofanalysesfromrealpatientdataandsyntheticderivatives
AT laialbertm spotthedifferencecomparingresultsofanalysesfromrealpatientdataandsyntheticderivatives
AT paynephilipro spotthedifferencecomparingresultsofanalysesfromrealpatientdataandsyntheticderivatives