Cargando…
EHRtemporalVariability: delineating temporal data-set shifts in electronic health records
BACKGROUND: Temporal variability in health-care processes or protocols is intrinsic to medicine. Such variability can potentially introduce dataset shifts, a data quality issue when reusing electronic health records (EHRs) for secondary purposes. Temporal data-set shifts can present as trends, as we...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7391413/ https://www.ncbi.nlm.nih.gov/pubmed/32729900 http://dx.doi.org/10.1093/gigascience/giaa079 |
_version_ | 1783564631348871168 |
---|---|
author | Sáez, Carlos Gutiérrez-Sacristán, Alba Kohane, Isaac García-Gómez, Juan M Avillach, Paul |
author_facet | Sáez, Carlos Gutiérrez-Sacristán, Alba Kohane, Isaac García-Gómez, Juan M Avillach, Paul |
author_sort | Sáez, Carlos |
collection | PubMed |
description | BACKGROUND: Temporal variability in health-care processes or protocols is intrinsic to medicine. Such variability can potentially introduce dataset shifts, a data quality issue when reusing electronic health records (EHRs) for secondary purposes. Temporal data-set shifts can present as trends, as well as abrupt or seasonal changes in the statistical distributions of data over time. The latter are particularly complicated to address in multimodal and highly coded data. These changes, if not delineated, can harm population and data-driven research, such as machine learning. Given that biomedical research repositories are increasingly being populated with large sets of historical data from EHRs, there is a need for specific software methods to help delineate temporal data-set shifts to ensure reliable data reuse. RESULTS: EHRtemporalVariability is an open-source R package and Shiny app designed to explore and identify temporal data-set shifts. EHRtemporalVariability estimates the statistical distributions of coded and numerical data over time; projects their temporal evolution through non-parametric information geometric temporal plots; and enables the exploration of changes in variables through data temporal heat maps. We demonstrate the capability of EHRtemporalVariability to delineate data-set shifts in three impact case studies, one of which is available for reproducibility. CONCLUSIONS: EHRtemporalVariability enables the exploration and identification of data-set shifts, contributing to the broad examination and repurposing of large, longitudinal data sets. Our goal is to help ensure reliable data reuse for a wide range of biomedical data users. EHRtemporalVariability is designed for technical users who are programmatically utilizing the R package, as well as users who are not familiar with programming via the Shiny user interface. Availability: https://github.com/hms-dbmi/EHRtemporalVariability/ Reproducible vignette: https://cran.r-project.org/web/packages/EHRtemporalVariability/vignettes/EHRtemporalVariability.html Online demo: http://ehrtemporalvariability.upv.es/ |
format | Online Article Text |
id | pubmed-7391413 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-73914132020-08-04 EHRtemporalVariability: delineating temporal data-set shifts in electronic health records Sáez, Carlos Gutiérrez-Sacristán, Alba Kohane, Isaac García-Gómez, Juan M Avillach, Paul Gigascience Technical Note BACKGROUND: Temporal variability in health-care processes or protocols is intrinsic to medicine. Such variability can potentially introduce dataset shifts, a data quality issue when reusing electronic health records (EHRs) for secondary purposes. Temporal data-set shifts can present as trends, as well as abrupt or seasonal changes in the statistical distributions of data over time. The latter are particularly complicated to address in multimodal and highly coded data. These changes, if not delineated, can harm population and data-driven research, such as machine learning. Given that biomedical research repositories are increasingly being populated with large sets of historical data from EHRs, there is a need for specific software methods to help delineate temporal data-set shifts to ensure reliable data reuse. RESULTS: EHRtemporalVariability is an open-source R package and Shiny app designed to explore and identify temporal data-set shifts. EHRtemporalVariability estimates the statistical distributions of coded and numerical data over time; projects their temporal evolution through non-parametric information geometric temporal plots; and enables the exploration of changes in variables through data temporal heat maps. We demonstrate the capability of EHRtemporalVariability to delineate data-set shifts in three impact case studies, one of which is available for reproducibility. CONCLUSIONS: EHRtemporalVariability enables the exploration and identification of data-set shifts, contributing to the broad examination and repurposing of large, longitudinal data sets. Our goal is to help ensure reliable data reuse for a wide range of biomedical data users. EHRtemporalVariability is designed for technical users who are programmatically utilizing the R package, as well as users who are not familiar with programming via the Shiny user interface. Availability: https://github.com/hms-dbmi/EHRtemporalVariability/ Reproducible vignette: https://cran.r-project.org/web/packages/EHRtemporalVariability/vignettes/EHRtemporalVariability.html Online demo: http://ehrtemporalvariability.upv.es/ Oxford University Press 2020-07-30 /pmc/articles/PMC7391413/ /pubmed/32729900 http://dx.doi.org/10.1093/gigascience/giaa079 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Technical Note Sáez, Carlos Gutiérrez-Sacristán, Alba Kohane, Isaac García-Gómez, Juan M Avillach, Paul EHRtemporalVariability: delineating temporal data-set shifts in electronic health records |
title | EHRtemporalVariability: delineating temporal data-set shifts in electronic health records |
title_full | EHRtemporalVariability: delineating temporal data-set shifts in electronic health records |
title_fullStr | EHRtemporalVariability: delineating temporal data-set shifts in electronic health records |
title_full_unstemmed | EHRtemporalVariability: delineating temporal data-set shifts in electronic health records |
title_short | EHRtemporalVariability: delineating temporal data-set shifts in electronic health records |
title_sort | ehrtemporalvariability: delineating temporal data-set shifts in electronic health records |
topic | Technical Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7391413/ https://www.ncbi.nlm.nih.gov/pubmed/32729900 http://dx.doi.org/10.1093/gigascience/giaa079 |
work_keys_str_mv | AT saezcarlos ehrtemporalvariabilitydelineatingtemporaldatasetshiftsinelectronichealthrecords AT gutierrezsacristanalba ehrtemporalvariabilitydelineatingtemporaldatasetshiftsinelectronichealthrecords AT kohaneisaac ehrtemporalvariabilitydelineatingtemporaldatasetshiftsinelectronichealthrecords AT garciagomezjuanm ehrtemporalvariabilitydelineatingtemporaldatasetshiftsinelectronichealthrecords AT avillachpaul ehrtemporalvariabilitydelineatingtemporaldatasetshiftsinelectronichealthrecords |