Cargando…
Conditional generation of medical time series for extrapolation to underrepresented populations
The widespread adoption of electronic health records (EHRs) and subsequent increased availability of longitudinal healthcare data has led to significant advances in our understanding of health and disease with direct and immediate impact on the development of new diagnostics and therapeutic treatmen...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9931259/ https://www.ncbi.nlm.nih.gov/pubmed/36812549 http://dx.doi.org/10.1371/journal.pdig.0000074 |
_version_ | 1784889209746620416 |
---|---|
author | Bing, Simon Dittadi, Andrea Bauer, Stefan Schwab, Patrick |
author_facet | Bing, Simon Dittadi, Andrea Bauer, Stefan Schwab, Patrick |
author_sort | Bing, Simon |
collection | PubMed |
description | The widespread adoption of electronic health records (EHRs) and subsequent increased availability of longitudinal healthcare data has led to significant advances in our understanding of health and disease with direct and immediate impact on the development of new diagnostics and therapeutic treatment options. However, access to EHRs is often restricted due to their perceived sensitive nature and associated legal concerns, and the cohorts therein typically are those seen at a specific hospital or network of hospitals and therefore not representative of the wider population of patients. Here, we present HealthGen, a new approach for the conditional generation of synthetic EHRs that maintains an accurate representation of real patient characteristics, temporal information and missingness patterns. We demonstrate experimentally that HealthGen generates synthetic cohorts that are significantly more faithful to real patient EHRs than the current state-of-the-art, and that augmenting real data sets with conditionally generated cohorts of underrepresented subpopulations of patients can significantly enhance the generalisability of models derived from these data sets to different patient populations. Synthetic conditionally generated EHRs could help increase the accessibility of longitudinal healthcare data sets and improve the generalisability of inferences made from these data sets to underrepresented populations. |
format | Online Article Text |
id | pubmed-9931259 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-99312592023-02-16 Conditional generation of medical time series for extrapolation to underrepresented populations Bing, Simon Dittadi, Andrea Bauer, Stefan Schwab, Patrick PLOS Digit Health Research Article The widespread adoption of electronic health records (EHRs) and subsequent increased availability of longitudinal healthcare data has led to significant advances in our understanding of health and disease with direct and immediate impact on the development of new diagnostics and therapeutic treatment options. However, access to EHRs is often restricted due to their perceived sensitive nature and associated legal concerns, and the cohorts therein typically are those seen at a specific hospital or network of hospitals and therefore not representative of the wider population of patients. Here, we present HealthGen, a new approach for the conditional generation of synthetic EHRs that maintains an accurate representation of real patient characteristics, temporal information and missingness patterns. We demonstrate experimentally that HealthGen generates synthetic cohorts that are significantly more faithful to real patient EHRs than the current state-of-the-art, and that augmenting real data sets with conditionally generated cohorts of underrepresented subpopulations of patients can significantly enhance the generalisability of models derived from these data sets to different patient populations. Synthetic conditionally generated EHRs could help increase the accessibility of longitudinal healthcare data sets and improve the generalisability of inferences made from these data sets to underrepresented populations. Public Library of Science 2022-07-19 /pmc/articles/PMC9931259/ /pubmed/36812549 http://dx.doi.org/10.1371/journal.pdig.0000074 Text en © 2022 Bing et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Bing, Simon Dittadi, Andrea Bauer, Stefan Schwab, Patrick Conditional generation of medical time series for extrapolation to underrepresented populations |
title | Conditional generation of medical time series for extrapolation to underrepresented populations |
title_full | Conditional generation of medical time series for extrapolation to underrepresented populations |
title_fullStr | Conditional generation of medical time series for extrapolation to underrepresented populations |
title_full_unstemmed | Conditional generation of medical time series for extrapolation to underrepresented populations |
title_short | Conditional generation of medical time series for extrapolation to underrepresented populations |
title_sort | conditional generation of medical time series for extrapolation to underrepresented populations |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9931259/ https://www.ncbi.nlm.nih.gov/pubmed/36812549 http://dx.doi.org/10.1371/journal.pdig.0000074 |
work_keys_str_mv | AT bingsimon conditionalgenerationofmedicaltimeseriesforextrapolationtounderrepresentedpopulations AT dittadiandrea conditionalgenerationofmedicaltimeseriesforextrapolationtounderrepresentedpopulations AT bauerstefan conditionalgenerationofmedicaltimeseriesforextrapolationtounderrepresentedpopulations AT schwabpatrick conditionalgenerationofmedicaltimeseriesforextrapolationtounderrepresentedpopulations |