Cargando…
The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures
BACKGROUND: Clinical data synthesis aims at generating realistic data for healthcare research, system implementation and training. It protects patient confidentiality, deepens our understanding of the complexity in healthcare, and is a promising tool for situations where real world data is difficult...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6416981/ https://www.ncbi.nlm.nih.gov/pubmed/30871520 http://dx.doi.org/10.1186/s12911-019-0793-0 |
_version_ | 1783403470441676800 |
---|---|
author | Chen, Junqiao Chun, David Patel, Milesh Chiang, Epson James, Jesse |
author_facet | Chen, Junqiao Chun, David Patel, Milesh Chiang, Epson James, Jesse |
author_sort | Chen, Junqiao |
collection | PubMed |
description | BACKGROUND: Clinical data synthesis aims at generating realistic data for healthcare research, system implementation and training. It protects patient confidentiality, deepens our understanding of the complexity in healthcare, and is a promising tool for situations where real world data is difficult to obtain or unnecessary. However, its validity has not been fully examined, and no previous study has validated it from the perspective of healthcare quality, a critical aspect of a healthcare system. This study fills this gap by calculating clinical quality measures using synthetic data. METHODS: We examined an open-source well-documented synthetic data generator Synthea, which was composed of the key advancements in this emerging technique. We selected a representative 1.2-million Massachusetts patient cohort generated by Synthea. Four quality measures, Colorectal Cancer Screening, Chronic Obstructive Pulmonary Disease (COPD) 30-Day Mortality, Rate of Complications after Hip/Knee Replacement, and Controlling High Blood Pressure, were selected based on clinical significance. Calculated rates were then compared with publicly reported rates based on real-world data of Massachusetts and United States. RESULTS: Of the total Synthea Massachusetts population (n = 1,193,439), 394,476 were eligible for the “colorectal cancer screening” quality measure, and 248,433 (63%) were considered compliant, compared to the publicly reported Massachusetts and national rates being 77.3 and 69.8%, respectively. Of the 409 eligible patients, 0.7% of died within 30 days after COPD exacerbation, versus 7% reported in Massachusetts and 8% nationally. Using an expanded logic, this rate increased to 5.7%. No Synthea residents had complications after Hip/Knee Replacement (Massachusetts: 2.9%, national: 2.8%) or had their blood pressure controlled after being diagnosed with hypertension (Massachusetts: 74.52%, national: 69.7%). Results show that Synthea is quite reliable in modeling demographics and probabilities of services being offered in an average healthcare setting. However, its capabilities to model heterogeneous health outcomes post services are limited. CONCLUSIONS: Synthea and other synthetic patient generators do not currently model for deviations in care and the potential outcomes that may result from care deviations. To output a more realistic data set, we propose that synthetic data generators should consider important quality measures in their logic and model when clinicians may deviate from standard practice. |
format | Online Article Text |
id | pubmed-6416981 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-64169812019-03-25 The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures Chen, Junqiao Chun, David Patel, Milesh Chiang, Epson James, Jesse BMC Med Inform Decis Mak Research Article BACKGROUND: Clinical data synthesis aims at generating realistic data for healthcare research, system implementation and training. It protects patient confidentiality, deepens our understanding of the complexity in healthcare, and is a promising tool for situations where real world data is difficult to obtain or unnecessary. However, its validity has not been fully examined, and no previous study has validated it from the perspective of healthcare quality, a critical aspect of a healthcare system. This study fills this gap by calculating clinical quality measures using synthetic data. METHODS: We examined an open-source well-documented synthetic data generator Synthea, which was composed of the key advancements in this emerging technique. We selected a representative 1.2-million Massachusetts patient cohort generated by Synthea. Four quality measures, Colorectal Cancer Screening, Chronic Obstructive Pulmonary Disease (COPD) 30-Day Mortality, Rate of Complications after Hip/Knee Replacement, and Controlling High Blood Pressure, were selected based on clinical significance. Calculated rates were then compared with publicly reported rates based on real-world data of Massachusetts and United States. RESULTS: Of the total Synthea Massachusetts population (n = 1,193,439), 394,476 were eligible for the “colorectal cancer screening” quality measure, and 248,433 (63%) were considered compliant, compared to the publicly reported Massachusetts and national rates being 77.3 and 69.8%, respectively. Of the 409 eligible patients, 0.7% of died within 30 days after COPD exacerbation, versus 7% reported in Massachusetts and 8% nationally. Using an expanded logic, this rate increased to 5.7%. No Synthea residents had complications after Hip/Knee Replacement (Massachusetts: 2.9%, national: 2.8%) or had their blood pressure controlled after being diagnosed with hypertension (Massachusetts: 74.52%, national: 69.7%). Results show that Synthea is quite reliable in modeling demographics and probabilities of services being offered in an average healthcare setting. However, its capabilities to model heterogeneous health outcomes post services are limited. CONCLUSIONS: Synthea and other synthetic patient generators do not currently model for deviations in care and the potential outcomes that may result from care deviations. To output a more realistic data set, we propose that synthetic data generators should consider important quality measures in their logic and model when clinicians may deviate from standard practice. BioMed Central 2019-03-14 /pmc/articles/PMC6416981/ /pubmed/30871520 http://dx.doi.org/10.1186/s12911-019-0793-0 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Chen, Junqiao Chun, David Patel, Milesh Chiang, Epson James, Jesse The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures |
title | The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures |
title_full | The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures |
title_fullStr | The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures |
title_full_unstemmed | The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures |
title_short | The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures |
title_sort | validity of synthetic clinical data: a validation study of a leading synthetic data generator (synthea) using clinical quality measures |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6416981/ https://www.ncbi.nlm.nih.gov/pubmed/30871520 http://dx.doi.org/10.1186/s12911-019-0793-0 |
work_keys_str_mv | AT chenjunqiao thevalidityofsyntheticclinicaldataavalidationstudyofaleadingsyntheticdatageneratorsyntheausingclinicalqualitymeasures AT chundavid thevalidityofsyntheticclinicaldataavalidationstudyofaleadingsyntheticdatageneratorsyntheausingclinicalqualitymeasures AT patelmilesh thevalidityofsyntheticclinicaldataavalidationstudyofaleadingsyntheticdatageneratorsyntheausingclinicalqualitymeasures AT chiangepson thevalidityofsyntheticclinicaldataavalidationstudyofaleadingsyntheticdatageneratorsyntheausingclinicalqualitymeasures AT jamesjesse thevalidityofsyntheticclinicaldataavalidationstudyofaleadingsyntheticdatageneratorsyntheausingclinicalqualitymeasures AT chenjunqiao validityofsyntheticclinicaldataavalidationstudyofaleadingsyntheticdatageneratorsyntheausingclinicalqualitymeasures AT chundavid validityofsyntheticclinicaldataavalidationstudyofaleadingsyntheticdatageneratorsyntheausingclinicalqualitymeasures AT patelmilesh validityofsyntheticclinicaldataavalidationstudyofaleadingsyntheticdatageneratorsyntheausingclinicalqualitymeasures AT chiangepson validityofsyntheticclinicaldataavalidationstudyofaleadingsyntheticdatageneratorsyntheausingclinicalqualitymeasures AT jamesjesse validityofsyntheticclinicaldataavalidationstudyofaleadingsyntheticdatageneratorsyntheausingclinicalqualitymeasures |