Cargando…

758. High-Throughput Mining of Electronic Medical Records Using Generalizable Autonomous Scripts

BACKGROUND: The electronic medical record (EMR) has become a modern compendium of health information, from broad clinical assessments down to an individual’s heart rate. The wealth of information in these EMRs hold promise for clinical discovery and hypothesis generation. Unfortunately, as these sys...

Descripción completa

Detalles Bibliográficos
Autores principales: Rochat, Ryan H, Demmler-Harrison, Gail J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6811295/
http://dx.doi.org/10.1093/ofid/ofz360.826
_version_ 1783462447925952512
author Rochat, Ryan H
Demmler-Harrison, Gail J
author_facet Rochat, Ryan H
Demmler-Harrison, Gail J
author_sort Rochat, Ryan H
collection PubMed
description BACKGROUND: The electronic medical record (EMR) has become a modern compendium of health information, from broad clinical assessments down to an individual’s heart rate. The wealth of information in these EMRs hold promise for clinical discovery and hypothesis generation. Unfortunately, as these systems have become more robust, mining them for relevant clinical information is hindered by the overall data architecture, and often requires the expertise of a clinical informatician to extract relevant data. However, as the information presented to the clinician through the digital workspace is derived from the core EMR database, the format is well structured and can be mined using text recognition and parsing scripts. METHODS: Here we present a program which can parse output from Epic Hyperspace®, generating a relational database of clinical information. To facilitate ease of use, our protocol capitalizes on the familiarity of Microsoft Excel® as an intermediary for storing the raw output from the EMR, with data parsing and processing scripts written in SAS V9.4 (Cary, North Carolina). RESULTS: As a proof of concept, we extracted the diagnosis codes and standard laboratories for 190 patients seen in our Congenital Cytomegalovirus Clinic at Texas Children’s Hospital in Houston, Texas. Manual extraction of these data into Microsoft Excel® took 1 hour, and the scripts to parse the data took less than 5 seconds to run. Data from these patients included: 3800 ICD-10 codes (along with their metadata) and 33,000 individual laboratory values. In total, more than 850,000 characters were extracted from the EMR using this technique. Manual review of 10 randomly selected charts, found the data in perfect concordant with the EMR, a direct reflection of the fidelity of the parsing scripts. On average, an experienced user was able to enter three ICD-10 codes each minute, and six individual laboratory values per minute. At best, this same process would have taken at least 110 hours using a conventional chart review technique. CONCLUSION: High-throughput data mining tools have the potential to improve the feasibility of studies dependent upon information stored in the EMR. When coupled with specific content knowledge, this approach can consolidate months of data collection into a day’s task. DISCLOSURES: All authors: No reported disclosures
format Online
Article
Text
id pubmed-6811295
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-68112952019-10-29 758. High-Throughput Mining of Electronic Medical Records Using Generalizable Autonomous Scripts Rochat, Ryan H Demmler-Harrison, Gail J Open Forum Infect Dis Abstracts BACKGROUND: The electronic medical record (EMR) has become a modern compendium of health information, from broad clinical assessments down to an individual’s heart rate. The wealth of information in these EMRs hold promise for clinical discovery and hypothesis generation. Unfortunately, as these systems have become more robust, mining them for relevant clinical information is hindered by the overall data architecture, and often requires the expertise of a clinical informatician to extract relevant data. However, as the information presented to the clinician through the digital workspace is derived from the core EMR database, the format is well structured and can be mined using text recognition and parsing scripts. METHODS: Here we present a program which can parse output from Epic Hyperspace®, generating a relational database of clinical information. To facilitate ease of use, our protocol capitalizes on the familiarity of Microsoft Excel® as an intermediary for storing the raw output from the EMR, with data parsing and processing scripts written in SAS V9.4 (Cary, North Carolina). RESULTS: As a proof of concept, we extracted the diagnosis codes and standard laboratories for 190 patients seen in our Congenital Cytomegalovirus Clinic at Texas Children’s Hospital in Houston, Texas. Manual extraction of these data into Microsoft Excel® took 1 hour, and the scripts to parse the data took less than 5 seconds to run. Data from these patients included: 3800 ICD-10 codes (along with their metadata) and 33,000 individual laboratory values. In total, more than 850,000 characters were extracted from the EMR using this technique. Manual review of 10 randomly selected charts, found the data in perfect concordant with the EMR, a direct reflection of the fidelity of the parsing scripts. On average, an experienced user was able to enter three ICD-10 codes each minute, and six individual laboratory values per minute. At best, this same process would have taken at least 110 hours using a conventional chart review technique. CONCLUSION: High-throughput data mining tools have the potential to improve the feasibility of studies dependent upon information stored in the EMR. When coupled with specific content knowledge, this approach can consolidate months of data collection into a day’s task. DISCLOSURES: All authors: No reported disclosures Oxford University Press 2019-10-23 /pmc/articles/PMC6811295/ http://dx.doi.org/10.1093/ofid/ofz360.826 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of Infectious Diseases Society of America. http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Abstracts
Rochat, Ryan H
Demmler-Harrison, Gail J
758. High-Throughput Mining of Electronic Medical Records Using Generalizable Autonomous Scripts
title 758. High-Throughput Mining of Electronic Medical Records Using Generalizable Autonomous Scripts
title_full 758. High-Throughput Mining of Electronic Medical Records Using Generalizable Autonomous Scripts
title_fullStr 758. High-Throughput Mining of Electronic Medical Records Using Generalizable Autonomous Scripts
title_full_unstemmed 758. High-Throughput Mining of Electronic Medical Records Using Generalizable Autonomous Scripts
title_short 758. High-Throughput Mining of Electronic Medical Records Using Generalizable Autonomous Scripts
title_sort 758. high-throughput mining of electronic medical records using generalizable autonomous scripts
topic Abstracts
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6811295/
http://dx.doi.org/10.1093/ofid/ofz360.826
work_keys_str_mv AT rochatryanh 758highthroughputminingofelectronicmedicalrecordsusinggeneralizableautonomousscripts
AT demmlerharrisongailj 758highthroughputminingofelectronicmedicalrecordsusinggeneralizableautonomousscripts