Cargando…

Conditional generation of medical time series for extrapolation to underrepresented populations

The widespread adoption of electronic health records (EHRs) and subsequent increased availability of longitudinal healthcare data has led to significant advances in our understanding of health and disease with direct and immediate impact on the development of new diagnostics and therapeutic treatmen...

Descripción completa

Detalles Bibliográficos
Autores principales: Bing, Simon, Dittadi, Andrea, Bauer, Stefan, Schwab, Patrick
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9931259/
https://www.ncbi.nlm.nih.gov/pubmed/36812549
http://dx.doi.org/10.1371/journal.pdig.0000074
_version_ 1784889209746620416
author Bing, Simon
Dittadi, Andrea
Bauer, Stefan
Schwab, Patrick
author_facet Bing, Simon
Dittadi, Andrea
Bauer, Stefan
Schwab, Patrick
author_sort Bing, Simon
collection PubMed
description The widespread adoption of electronic health records (EHRs) and subsequent increased availability of longitudinal healthcare data has led to significant advances in our understanding of health and disease with direct and immediate impact on the development of new diagnostics and therapeutic treatment options. However, access to EHRs is often restricted due to their perceived sensitive nature and associated legal concerns, and the cohorts therein typically are those seen at a specific hospital or network of hospitals and therefore not representative of the wider population of patients. Here, we present HealthGen, a new approach for the conditional generation of synthetic EHRs that maintains an accurate representation of real patient characteristics, temporal information and missingness patterns. We demonstrate experimentally that HealthGen generates synthetic cohorts that are significantly more faithful to real patient EHRs than the current state-of-the-art, and that augmenting real data sets with conditionally generated cohorts of underrepresented subpopulations of patients can significantly enhance the generalisability of models derived from these data sets to different patient populations. Synthetic conditionally generated EHRs could help increase the accessibility of longitudinal healthcare data sets and improve the generalisability of inferences made from these data sets to underrepresented populations.
format Online
Article
Text
id pubmed-9931259
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-99312592023-02-16 Conditional generation of medical time series for extrapolation to underrepresented populations Bing, Simon Dittadi, Andrea Bauer, Stefan Schwab, Patrick PLOS Digit Health Research Article The widespread adoption of electronic health records (EHRs) and subsequent increased availability of longitudinal healthcare data has led to significant advances in our understanding of health and disease with direct and immediate impact on the development of new diagnostics and therapeutic treatment options. However, access to EHRs is often restricted due to their perceived sensitive nature and associated legal concerns, and the cohorts therein typically are those seen at a specific hospital or network of hospitals and therefore not representative of the wider population of patients. Here, we present HealthGen, a new approach for the conditional generation of synthetic EHRs that maintains an accurate representation of real patient characteristics, temporal information and missingness patterns. We demonstrate experimentally that HealthGen generates synthetic cohorts that are significantly more faithful to real patient EHRs than the current state-of-the-art, and that augmenting real data sets with conditionally generated cohorts of underrepresented subpopulations of patients can significantly enhance the generalisability of models derived from these data sets to different patient populations. Synthetic conditionally generated EHRs could help increase the accessibility of longitudinal healthcare data sets and improve the generalisability of inferences made from these data sets to underrepresented populations. Public Library of Science 2022-07-19 /pmc/articles/PMC9931259/ /pubmed/36812549 http://dx.doi.org/10.1371/journal.pdig.0000074 Text en © 2022 Bing et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Bing, Simon
Dittadi, Andrea
Bauer, Stefan
Schwab, Patrick
Conditional generation of medical time series for extrapolation to underrepresented populations
title Conditional generation of medical time series for extrapolation to underrepresented populations
title_full Conditional generation of medical time series for extrapolation to underrepresented populations
title_fullStr Conditional generation of medical time series for extrapolation to underrepresented populations
title_full_unstemmed Conditional generation of medical time series for extrapolation to underrepresented populations
title_short Conditional generation of medical time series for extrapolation to underrepresented populations
title_sort conditional generation of medical time series for extrapolation to underrepresented populations
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9931259/
https://www.ncbi.nlm.nih.gov/pubmed/36812549
http://dx.doi.org/10.1371/journal.pdig.0000074
work_keys_str_mv AT bingsimon conditionalgenerationofmedicaltimeseriesforextrapolationtounderrepresentedpopulations
AT dittadiandrea conditionalgenerationofmedicaltimeseriesforextrapolationtounderrepresentedpopulations
AT bauerstefan conditionalgenerationofmedicaltimeseriesforextrapolationtounderrepresentedpopulations
AT schwabpatrick conditionalgenerationofmedicaltimeseriesforextrapolationtounderrepresentedpopulations