Cargando…

EHRtemporalVariability: delineating temporal data-set shifts in electronic health records

BACKGROUND: Temporal variability in health-care processes or protocols is intrinsic to medicine. Such variability can potentially introduce dataset shifts, a data quality issue when reusing electronic health records (EHRs) for secondary purposes. Temporal data-set shifts can present as trends, as we...

Descripción completa

Detalles Bibliográficos
Autores principales: Sáez, Carlos, Gutiérrez-Sacristán, Alba, Kohane, Isaac, García-Gómez, Juan M, Avillach, Paul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7391413/
https://www.ncbi.nlm.nih.gov/pubmed/32729900
http://dx.doi.org/10.1093/gigascience/giaa079
_version_ 1783564631348871168
author Sáez, Carlos
Gutiérrez-Sacristán, Alba
Kohane, Isaac
García-Gómez, Juan M
Avillach, Paul
author_facet Sáez, Carlos
Gutiérrez-Sacristán, Alba
Kohane, Isaac
García-Gómez, Juan M
Avillach, Paul
author_sort Sáez, Carlos
collection PubMed
description BACKGROUND: Temporal variability in health-care processes or protocols is intrinsic to medicine. Such variability can potentially introduce dataset shifts, a data quality issue when reusing electronic health records (EHRs) for secondary purposes. Temporal data-set shifts can present as trends, as well as abrupt or seasonal changes in the statistical distributions of data over time. The latter are particularly complicated to address in multimodal and highly coded data. These changes, if not delineated, can harm population and data-driven research, such as machine learning. Given that biomedical research repositories are increasingly being populated with large sets of historical data from EHRs, there is a need for specific software methods to help delineate temporal data-set shifts to ensure reliable data reuse. RESULTS: EHRtemporalVariability is an open-source R package and Shiny app designed to explore and identify temporal data-set shifts. EHRtemporalVariability estimates the statistical distributions of coded and numerical data over time; projects their temporal evolution through non-parametric information geometric temporal plots; and enables the exploration of changes in variables through data temporal heat maps. We demonstrate the capability of EHRtemporalVariability to delineate data-set shifts in three impact case studies, one of which is available for reproducibility. CONCLUSIONS: EHRtemporalVariability enables the exploration and identification of data-set shifts, contributing to the broad examination and repurposing of large, longitudinal data sets. Our goal is to help ensure reliable data reuse for a wide range of biomedical data users. EHRtemporalVariability is designed for technical users who are programmatically utilizing the R package, as well as users who are not familiar with programming via the Shiny user interface. Availability: https://github.com/hms-dbmi/EHRtemporalVariability/ Reproducible vignette: https://cran.r-project.org/web/packages/EHRtemporalVariability/vignettes/EHRtemporalVariability.html Online demo: http://ehrtemporalvariability.upv.es/
format Online
Article
Text
id pubmed-7391413
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-73914132020-08-04 EHRtemporalVariability: delineating temporal data-set shifts in electronic health records Sáez, Carlos Gutiérrez-Sacristán, Alba Kohane, Isaac García-Gómez, Juan M Avillach, Paul Gigascience Technical Note BACKGROUND: Temporal variability in health-care processes or protocols is intrinsic to medicine. Such variability can potentially introduce dataset shifts, a data quality issue when reusing electronic health records (EHRs) for secondary purposes. Temporal data-set shifts can present as trends, as well as abrupt or seasonal changes in the statistical distributions of data over time. The latter are particularly complicated to address in multimodal and highly coded data. These changes, if not delineated, can harm population and data-driven research, such as machine learning. Given that biomedical research repositories are increasingly being populated with large sets of historical data from EHRs, there is a need for specific software methods to help delineate temporal data-set shifts to ensure reliable data reuse. RESULTS: EHRtemporalVariability is an open-source R package and Shiny app designed to explore and identify temporal data-set shifts. EHRtemporalVariability estimates the statistical distributions of coded and numerical data over time; projects their temporal evolution through non-parametric information geometric temporal plots; and enables the exploration of changes in variables through data temporal heat maps. We demonstrate the capability of EHRtemporalVariability to delineate data-set shifts in three impact case studies, one of which is available for reproducibility. CONCLUSIONS: EHRtemporalVariability enables the exploration and identification of data-set shifts, contributing to the broad examination and repurposing of large, longitudinal data sets. Our goal is to help ensure reliable data reuse for a wide range of biomedical data users. EHRtemporalVariability is designed for technical users who are programmatically utilizing the R package, as well as users who are not familiar with programming via the Shiny user interface. Availability: https://github.com/hms-dbmi/EHRtemporalVariability/ Reproducible vignette: https://cran.r-project.org/web/packages/EHRtemporalVariability/vignettes/EHRtemporalVariability.html Online demo: http://ehrtemporalvariability.upv.es/ Oxford University Press 2020-07-30 /pmc/articles/PMC7391413/ /pubmed/32729900 http://dx.doi.org/10.1093/gigascience/giaa079 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Sáez, Carlos
Gutiérrez-Sacristán, Alba
Kohane, Isaac
García-Gómez, Juan M
Avillach, Paul
EHRtemporalVariability: delineating temporal data-set shifts in electronic health records
title EHRtemporalVariability: delineating temporal data-set shifts in electronic health records
title_full EHRtemporalVariability: delineating temporal data-set shifts in electronic health records
title_fullStr EHRtemporalVariability: delineating temporal data-set shifts in electronic health records
title_full_unstemmed EHRtemporalVariability: delineating temporal data-set shifts in electronic health records
title_short EHRtemporalVariability: delineating temporal data-set shifts in electronic health records
title_sort ehrtemporalvariability: delineating temporal data-set shifts in electronic health records
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7391413/
https://www.ncbi.nlm.nih.gov/pubmed/32729900
http://dx.doi.org/10.1093/gigascience/giaa079
work_keys_str_mv AT saezcarlos ehrtemporalvariabilitydelineatingtemporaldatasetshiftsinelectronichealthrecords
AT gutierrezsacristanalba ehrtemporalvariabilitydelineatingtemporaldatasetshiftsinelectronichealthrecords
AT kohaneisaac ehrtemporalvariabilitydelineatingtemporaldatasetshiftsinelectronichealthrecords
AT garciagomezjuanm ehrtemporalvariabilitydelineatingtemporaldatasetshiftsinelectronichealthrecords
AT avillachpaul ehrtemporalvariabilitydelineatingtemporaldatasetshiftsinelectronichealthrecords