Cargando…

The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures

BACKGROUND: Clinical data synthesis aims at generating realistic data for healthcare research, system implementation and training. It protects patient confidentiality, deepens our understanding of the complexity in healthcare, and is a promising tool for situations where real world data is difficult...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Junqiao, Chun, David, Patel, Milesh, Chiang, Epson, James, Jesse
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6416981/
https://www.ncbi.nlm.nih.gov/pubmed/30871520
http://dx.doi.org/10.1186/s12911-019-0793-0
_version_ 1783403470441676800
author Chen, Junqiao
Chun, David
Patel, Milesh
Chiang, Epson
James, Jesse
author_facet Chen, Junqiao
Chun, David
Patel, Milesh
Chiang, Epson
James, Jesse
author_sort Chen, Junqiao
collection PubMed
description BACKGROUND: Clinical data synthesis aims at generating realistic data for healthcare research, system implementation and training. It protects patient confidentiality, deepens our understanding of the complexity in healthcare, and is a promising tool for situations where real world data is difficult to obtain or unnecessary. However, its validity has not been fully examined, and no previous study has validated it from the perspective of healthcare quality, a critical aspect of a healthcare system. This study fills this gap by calculating clinical quality measures using synthetic data. METHODS: We examined an open-source well-documented synthetic data generator Synthea, which was composed of the key advancements in this emerging technique. We selected a representative 1.2-million Massachusetts patient cohort generated by Synthea. Four quality measures, Colorectal Cancer Screening, Chronic Obstructive Pulmonary Disease (COPD) 30-Day Mortality, Rate of Complications after Hip/Knee Replacement, and Controlling High Blood Pressure, were selected based on clinical significance. Calculated rates were then compared with publicly reported rates based on real-world data of Massachusetts and United States. RESULTS: Of the total Synthea Massachusetts population (n = 1,193,439), 394,476 were eligible for the “colorectal cancer screening” quality measure, and 248,433 (63%) were considered compliant, compared to the publicly reported Massachusetts and national rates being 77.3 and 69.8%, respectively. Of the 409 eligible patients, 0.7% of died within 30 days after COPD exacerbation, versus 7% reported in Massachusetts and 8% nationally. Using an expanded logic, this rate increased to 5.7%. No Synthea residents had complications after Hip/Knee Replacement (Massachusetts: 2.9%, national: 2.8%) or had their blood pressure controlled after being diagnosed with hypertension (Massachusetts: 74.52%, national: 69.7%). Results show that Synthea is quite reliable in modeling demographics and probabilities of services being offered in an average healthcare setting. However, its capabilities to model heterogeneous health outcomes post services are limited. CONCLUSIONS: Synthea and other synthetic patient generators do not currently model for deviations in care and the potential outcomes that may result from care deviations. To output a more realistic data set, we propose that synthetic data generators should consider important quality measures in their logic and model when clinicians may deviate from standard practice.
format Online
Article
Text
id pubmed-6416981
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64169812019-03-25 The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures Chen, Junqiao Chun, David Patel, Milesh Chiang, Epson James, Jesse BMC Med Inform Decis Mak Research Article BACKGROUND: Clinical data synthesis aims at generating realistic data for healthcare research, system implementation and training. It protects patient confidentiality, deepens our understanding of the complexity in healthcare, and is a promising tool for situations where real world data is difficult to obtain or unnecessary. However, its validity has not been fully examined, and no previous study has validated it from the perspective of healthcare quality, a critical aspect of a healthcare system. This study fills this gap by calculating clinical quality measures using synthetic data. METHODS: We examined an open-source well-documented synthetic data generator Synthea, which was composed of the key advancements in this emerging technique. We selected a representative 1.2-million Massachusetts patient cohort generated by Synthea. Four quality measures, Colorectal Cancer Screening, Chronic Obstructive Pulmonary Disease (COPD) 30-Day Mortality, Rate of Complications after Hip/Knee Replacement, and Controlling High Blood Pressure, were selected based on clinical significance. Calculated rates were then compared with publicly reported rates based on real-world data of Massachusetts and United States. RESULTS: Of the total Synthea Massachusetts population (n = 1,193,439), 394,476 were eligible for the “colorectal cancer screening” quality measure, and 248,433 (63%) were considered compliant, compared to the publicly reported Massachusetts and national rates being 77.3 and 69.8%, respectively. Of the 409 eligible patients, 0.7% of died within 30 days after COPD exacerbation, versus 7% reported in Massachusetts and 8% nationally. Using an expanded logic, this rate increased to 5.7%. No Synthea residents had complications after Hip/Knee Replacement (Massachusetts: 2.9%, national: 2.8%) or had their blood pressure controlled after being diagnosed with hypertension (Massachusetts: 74.52%, national: 69.7%). Results show that Synthea is quite reliable in modeling demographics and probabilities of services being offered in an average healthcare setting. However, its capabilities to model heterogeneous health outcomes post services are limited. CONCLUSIONS: Synthea and other synthetic patient generators do not currently model for deviations in care and the potential outcomes that may result from care deviations. To output a more realistic data set, we propose that synthetic data generators should consider important quality measures in their logic and model when clinicians may deviate from standard practice. BioMed Central 2019-03-14 /pmc/articles/PMC6416981/ /pubmed/30871520 http://dx.doi.org/10.1186/s12911-019-0793-0 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Chen, Junqiao
Chun, David
Patel, Milesh
Chiang, Epson
James, Jesse
The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures
title The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures
title_full The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures
title_fullStr The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures
title_full_unstemmed The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures
title_short The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures
title_sort validity of synthetic clinical data: a validation study of a leading synthetic data generator (synthea) using clinical quality measures
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6416981/
https://www.ncbi.nlm.nih.gov/pubmed/30871520
http://dx.doi.org/10.1186/s12911-019-0793-0
work_keys_str_mv AT chenjunqiao thevalidityofsyntheticclinicaldataavalidationstudyofaleadingsyntheticdatageneratorsyntheausingclinicalqualitymeasures
AT chundavid thevalidityofsyntheticclinicaldataavalidationstudyofaleadingsyntheticdatageneratorsyntheausingclinicalqualitymeasures
AT patelmilesh thevalidityofsyntheticclinicaldataavalidationstudyofaleadingsyntheticdatageneratorsyntheausingclinicalqualitymeasures
AT chiangepson thevalidityofsyntheticclinicaldataavalidationstudyofaleadingsyntheticdatageneratorsyntheausingclinicalqualitymeasures
AT jamesjesse thevalidityofsyntheticclinicaldataavalidationstudyofaleadingsyntheticdatageneratorsyntheausingclinicalqualitymeasures
AT chenjunqiao validityofsyntheticclinicaldataavalidationstudyofaleadingsyntheticdatageneratorsyntheausingclinicalqualitymeasures
AT chundavid validityofsyntheticclinicaldataavalidationstudyofaleadingsyntheticdatageneratorsyntheausingclinicalqualitymeasures
AT patelmilesh validityofsyntheticclinicaldataavalidationstudyofaleadingsyntheticdatageneratorsyntheausingclinicalqualitymeasures
AT chiangepson validityofsyntheticclinicaldataavalidationstudyofaleadingsyntheticdatageneratorsyntheausingclinicalqualitymeasures
AT jamesjesse validityofsyntheticclinicaldataavalidationstudyofaleadingsyntheticdatageneratorsyntheausingclinicalqualitymeasures