Cargando…

Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities

Electronic health records (EHRs) have been successfully used in data science and machine learning projects. However, most of these data are collected for clinical use rather than for retrospective analysis. This means that researchers typically face many different issues when attempting to access an...

Descripción completa

Detalles Bibliográficos
Autores principales: Maletzky, Alexander, Böck, Carl, Tschoellitsch, Thomas, Roland, Theresa, Ludwig, Helga, Thumfart, Stefan, Giretzlehner, Michael, Hochreiter, Sepp, Meier, Jens
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9636533/
https://www.ncbi.nlm.nih.gov/pubmed/36269654
http://dx.doi.org/10.2196/38557
_version_ 1784824965350031360
author Maletzky, Alexander
Böck, Carl
Tschoellitsch, Thomas
Roland, Theresa
Ludwig, Helga
Thumfart, Stefan
Giretzlehner, Michael
Hochreiter, Sepp
Meier, Jens
author_facet Maletzky, Alexander
Böck, Carl
Tschoellitsch, Thomas
Roland, Theresa
Ludwig, Helga
Thumfart, Stefan
Giretzlehner, Michael
Hochreiter, Sepp
Meier, Jens
author_sort Maletzky, Alexander
collection PubMed
description Electronic health records (EHRs) have been successfully used in data science and machine learning projects. However, most of these data are collected for clinical use rather than for retrospective analysis. This means that researchers typically face many different issues when attempting to access and prepare the data for secondary use. We aimed to investigate how raw EHRs can be accessed and prepared in retrospective data science projects in a disciplined, effective, and efficient way. We report our experience and findings from a large-scale data science project analyzing routinely acquired retrospective data from the Kepler University Hospital in Linz, Austria. The project involved data collection from more than 150,000 patients over a period of 10 years. It included diverse data modalities, such as static demographic data, irregularly acquired laboratory test results, regularly sampled vital signs, and high-frequency physiological waveform signals. Raw medical data can be corrupted in many unexpected ways that demand thorough manual inspection and highly individualized data cleaning solutions. We present a general data preparation workflow, which was shaped in the course of our project and consists of the following 7 steps: obtain a rough overview of the available EHR data, define clinically meaningful labels for supervised learning, extract relevant data from the hospital’s data warehouses, match data extracted from different sources, deidentify them, detect errors and inconsistencies therein through a careful exploratory analysis, and implement a suitable data processing pipeline in actual code. Only few of the data preparation issues encountered in our project were addressed by generic medical data preprocessing tools that have been proposed recently. Instead, highly individualized solutions for the specific data used in one’s own research seem inevitable. We believe that the proposed workflow can serve as a guidance for practitioners, helping them to identify and address potential problems early and avoid some common pitfalls.
format Online
Article
Text
id pubmed-9636533
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-96365332022-11-06 Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities Maletzky, Alexander Böck, Carl Tschoellitsch, Thomas Roland, Theresa Ludwig, Helga Thumfart, Stefan Giretzlehner, Michael Hochreiter, Sepp Meier, Jens JMIR Med Inform Viewpoint Electronic health records (EHRs) have been successfully used in data science and machine learning projects. However, most of these data are collected for clinical use rather than for retrospective analysis. This means that researchers typically face many different issues when attempting to access and prepare the data for secondary use. We aimed to investigate how raw EHRs can be accessed and prepared in retrospective data science projects in a disciplined, effective, and efficient way. We report our experience and findings from a large-scale data science project analyzing routinely acquired retrospective data from the Kepler University Hospital in Linz, Austria. The project involved data collection from more than 150,000 patients over a period of 10 years. It included diverse data modalities, such as static demographic data, irregularly acquired laboratory test results, regularly sampled vital signs, and high-frequency physiological waveform signals. Raw medical data can be corrupted in many unexpected ways that demand thorough manual inspection and highly individualized data cleaning solutions. We present a general data preparation workflow, which was shaped in the course of our project and consists of the following 7 steps: obtain a rough overview of the available EHR data, define clinically meaningful labels for supervised learning, extract relevant data from the hospital’s data warehouses, match data extracted from different sources, deidentify them, detect errors and inconsistencies therein through a careful exploratory analysis, and implement a suitable data processing pipeline in actual code. Only few of the data preparation issues encountered in our project were addressed by generic medical data preprocessing tools that have been proposed recently. Instead, highly individualized solutions for the specific data used in one’s own research seem inevitable. We believe that the proposed workflow can serve as a guidance for practitioners, helping them to identify and address potential problems early and avoid some common pitfalls. JMIR Publications 2022-10-21 /pmc/articles/PMC9636533/ /pubmed/36269654 http://dx.doi.org/10.2196/38557 Text en ©Alexander Maletzky, Carl Böck, Thomas Tschoellitsch, Theresa Roland, Helga Ludwig, Stefan Thumfart, Michael Giretzlehner, Sepp Hochreiter, Jens Meier. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 21.10.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Viewpoint
Maletzky, Alexander
Böck, Carl
Tschoellitsch, Thomas
Roland, Theresa
Ludwig, Helga
Thumfart, Stefan
Giretzlehner, Michael
Hochreiter, Sepp
Meier, Jens
Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities
title Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities
title_full Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities
title_fullStr Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities
title_full_unstemmed Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities
title_short Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities
title_sort lifting hospital electronic health record data treasures: challenges and opportunities
topic Viewpoint
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9636533/
https://www.ncbi.nlm.nih.gov/pubmed/36269654
http://dx.doi.org/10.2196/38557
work_keys_str_mv AT maletzkyalexander liftinghospitalelectronichealthrecorddatatreasureschallengesandopportunities
AT bockcarl liftinghospitalelectronichealthrecorddatatreasureschallengesandopportunities
AT tschoellitschthomas liftinghospitalelectronichealthrecorddatatreasureschallengesandopportunities
AT rolandtheresa liftinghospitalelectronichealthrecorddatatreasureschallengesandopportunities
AT ludwighelga liftinghospitalelectronichealthrecorddatatreasureschallengesandopportunities
AT thumfartstefan liftinghospitalelectronichealthrecorddatatreasureschallengesandopportunities
AT giretzlehnermichael liftinghospitalelectronichealthrecorddatatreasureschallengesandopportunities
AT hochreitersepp liftinghospitalelectronichealthrecorddatatreasureschallengesandopportunities
AT meierjens liftinghospitalelectronichealthrecorddatatreasureschallengesandopportunities