Cargando…
Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities
Electronic health records (EHRs) have been successfully used in data science and machine learning projects. However, most of these data are collected for clinical use rather than for retrospective analysis. This means that researchers typically face many different issues when attempting to access an...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9636533/ https://www.ncbi.nlm.nih.gov/pubmed/36269654 http://dx.doi.org/10.2196/38557 |
_version_ | 1784824965350031360 |
---|---|
author | Maletzky, Alexander Böck, Carl Tschoellitsch, Thomas Roland, Theresa Ludwig, Helga Thumfart, Stefan Giretzlehner, Michael Hochreiter, Sepp Meier, Jens |
author_facet | Maletzky, Alexander Böck, Carl Tschoellitsch, Thomas Roland, Theresa Ludwig, Helga Thumfart, Stefan Giretzlehner, Michael Hochreiter, Sepp Meier, Jens |
author_sort | Maletzky, Alexander |
collection | PubMed |
description | Electronic health records (EHRs) have been successfully used in data science and machine learning projects. However, most of these data are collected for clinical use rather than for retrospective analysis. This means that researchers typically face many different issues when attempting to access and prepare the data for secondary use. We aimed to investigate how raw EHRs can be accessed and prepared in retrospective data science projects in a disciplined, effective, and efficient way. We report our experience and findings from a large-scale data science project analyzing routinely acquired retrospective data from the Kepler University Hospital in Linz, Austria. The project involved data collection from more than 150,000 patients over a period of 10 years. It included diverse data modalities, such as static demographic data, irregularly acquired laboratory test results, regularly sampled vital signs, and high-frequency physiological waveform signals. Raw medical data can be corrupted in many unexpected ways that demand thorough manual inspection and highly individualized data cleaning solutions. We present a general data preparation workflow, which was shaped in the course of our project and consists of the following 7 steps: obtain a rough overview of the available EHR data, define clinically meaningful labels for supervised learning, extract relevant data from the hospital’s data warehouses, match data extracted from different sources, deidentify them, detect errors and inconsistencies therein through a careful exploratory analysis, and implement a suitable data processing pipeline in actual code. Only few of the data preparation issues encountered in our project were addressed by generic medical data preprocessing tools that have been proposed recently. Instead, highly individualized solutions for the specific data used in one’s own research seem inevitable. We believe that the proposed workflow can serve as a guidance for practitioners, helping them to identify and address potential problems early and avoid some common pitfalls. |
format | Online Article Text |
id | pubmed-9636533 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-96365332022-11-06 Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities Maletzky, Alexander Böck, Carl Tschoellitsch, Thomas Roland, Theresa Ludwig, Helga Thumfart, Stefan Giretzlehner, Michael Hochreiter, Sepp Meier, Jens JMIR Med Inform Viewpoint Electronic health records (EHRs) have been successfully used in data science and machine learning projects. However, most of these data are collected for clinical use rather than for retrospective analysis. This means that researchers typically face many different issues when attempting to access and prepare the data for secondary use. We aimed to investigate how raw EHRs can be accessed and prepared in retrospective data science projects in a disciplined, effective, and efficient way. We report our experience and findings from a large-scale data science project analyzing routinely acquired retrospective data from the Kepler University Hospital in Linz, Austria. The project involved data collection from more than 150,000 patients over a period of 10 years. It included diverse data modalities, such as static demographic data, irregularly acquired laboratory test results, regularly sampled vital signs, and high-frequency physiological waveform signals. Raw medical data can be corrupted in many unexpected ways that demand thorough manual inspection and highly individualized data cleaning solutions. We present a general data preparation workflow, which was shaped in the course of our project and consists of the following 7 steps: obtain a rough overview of the available EHR data, define clinically meaningful labels for supervised learning, extract relevant data from the hospital’s data warehouses, match data extracted from different sources, deidentify them, detect errors and inconsistencies therein through a careful exploratory analysis, and implement a suitable data processing pipeline in actual code. Only few of the data preparation issues encountered in our project were addressed by generic medical data preprocessing tools that have been proposed recently. Instead, highly individualized solutions for the specific data used in one’s own research seem inevitable. We believe that the proposed workflow can serve as a guidance for practitioners, helping them to identify and address potential problems early and avoid some common pitfalls. JMIR Publications 2022-10-21 /pmc/articles/PMC9636533/ /pubmed/36269654 http://dx.doi.org/10.2196/38557 Text en ©Alexander Maletzky, Carl Böck, Thomas Tschoellitsch, Theresa Roland, Helga Ludwig, Stefan Thumfart, Michael Giretzlehner, Sepp Hochreiter, Jens Meier. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 21.10.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Viewpoint Maletzky, Alexander Böck, Carl Tschoellitsch, Thomas Roland, Theresa Ludwig, Helga Thumfart, Stefan Giretzlehner, Michael Hochreiter, Sepp Meier, Jens Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities |
title | Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities |
title_full | Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities |
title_fullStr | Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities |
title_full_unstemmed | Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities |
title_short | Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities |
title_sort | lifting hospital electronic health record data treasures: challenges and opportunities |
topic | Viewpoint |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9636533/ https://www.ncbi.nlm.nih.gov/pubmed/36269654 http://dx.doi.org/10.2196/38557 |
work_keys_str_mv | AT maletzkyalexander liftinghospitalelectronichealthrecorddatatreasureschallengesandopportunities AT bockcarl liftinghospitalelectronichealthrecorddatatreasureschallengesandopportunities AT tschoellitschthomas liftinghospitalelectronichealthrecorddatatreasureschallengesandopportunities AT rolandtheresa liftinghospitalelectronichealthrecorddatatreasureschallengesandopportunities AT ludwighelga liftinghospitalelectronichealthrecorddatatreasureschallengesandopportunities AT thumfartstefan liftinghospitalelectronichealthrecorddatatreasureschallengesandopportunities AT giretzlehnermichael liftinghospitalelectronichealthrecorddatatreasureschallengesandopportunities AT hochreitersepp liftinghospitalelectronichealthrecorddatatreasureschallengesandopportunities AT meierjens liftinghospitalelectronichealthrecorddatatreasureschallengesandopportunities |