Cargando…

Ad Hoc Information Extraction for Clinical Data Warehouses

Background: Clinical Data Warehouses (CDW) reuse Electronic health records (EHR) to make their data retrievable for research purposes or patient recruitment for clinical trials. However, much information are hidden in unstructured data like discharge letters. They can be preprocessed and converted t...

Descripción completa

Detalles Bibliográficos
Autores principales: Dietrich, Georg, Krebs, Jonathan, Fette, Georg, Ertl, Maximilian, Kaspar, Mathias, Störk, Stefan, Puppe, Frank
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Schattauer GmbH 2018
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6193399/
https://www.ncbi.nlm.nih.gov/pubmed/29801178
http://dx.doi.org/10.3414/ME17-02-0010
_version_ 1783364064366297088
author Dietrich, Georg
Krebs, Jonathan
Fette, Georg
Ertl, Maximilian
Kaspar, Mathias
Störk, Stefan
Puppe, Frank
author_facet Dietrich, Georg
Krebs, Jonathan
Fette, Georg
Ertl, Maximilian
Kaspar, Mathias
Störk, Stefan
Puppe, Frank
author_sort Dietrich, Georg
collection PubMed
description Background: Clinical Data Warehouses (CDW) reuse Electronic health records (EHR) to make their data retrievable for research purposes or patient recruitment for clinical trials. However, much information are hidden in unstructured data like discharge letters. They can be preprocessed and converted to structured data via information extraction (IE), which is unfortunately a laborious task and therefore usually not available for most of the text data in CDW. Objectives: The goal of our work is to provide an ad hoc IE service that allows users to query text data ad hoc in a manner similar to querying structured data in a CDW. While search engines just return text snippets, our systems also returns frequencies (e.g. how many patients exist with “heart failure” including textual synonyms or how many patients have an LVEF < 45) based on the content of discharge letters or textual reports for special investigations like heart echo. Three subtasks are addressed: (1) To recognize and to exclude negations and their scopes, (2) to extract concepts, i.e. Boolean values and (3) to extract numerical values. Methods: We implemented an extended version of the NegEx-algorithm for German texts that detects negations and determines their scope. Furthermore, our document oriented CDW PaDaWaN was extended with query functions, e.g. context sensitive queries and regex queries, and an extraction mode for computing the frequencies for Boolean and numerical values. Results: Evaluations in chest X-ray reports and in discharge letters showed high F1-scores for the three subtasks: Detection of negated concepts in chest X-ray reports with an F1-score of 0.99 and in discharge letters with 0.97; of Boolean values in chest X-ray reports about 0.99, and of numerical values in chest X-ray reports and discharge letters also around 0.99 with the exception of the concept age. Discussion: The advantages of an ad hoc IE over a standard IE are the low development effort (just entering the concept with its variants), the promptness of the results and the adaptability by the user to his or her particular question. Disadvantage are usually lower accuracy and confidence. This ad hoc information extraction approach is novel and exceeds existing systems: Roogle [ 1 ] extracts predefined concepts from texts at preprocessing and makes them retrievable at runtime. Dr. Warehouse [ 2 ] applies negation detection and indexes the produced subtexts which include affirmed findings. Our approach combines negation detection and the extraction of concepts. But the extraction does not take place during preprocessing, but at runtime. That provides an ad hoc, dynamic, interactive and adjustable information extraction of random concepts and even their values on the fly at runtime. Conclusions: We developed an ad hoc information extraction query feature for Boolean and numerical values within a CDW with high recall and precision based on a pipeline that detects and removes negations and their scope in clinical texts.
format Online
Article
Text
id pubmed-6193399
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Schattauer GmbH
record_format MEDLINE/PubMed
spelling pubmed-61933992018-11-23 Ad Hoc Information Extraction for Clinical Data Warehouses Dietrich, Georg Krebs, Jonathan Fette, Georg Ertl, Maximilian Kaspar, Mathias Störk, Stefan Puppe, Frank Methods Inf Med Background: Clinical Data Warehouses (CDW) reuse Electronic health records (EHR) to make their data retrievable for research purposes or patient recruitment for clinical trials. However, much information are hidden in unstructured data like discharge letters. They can be preprocessed and converted to structured data via information extraction (IE), which is unfortunately a laborious task and therefore usually not available for most of the text data in CDW. Objectives: The goal of our work is to provide an ad hoc IE service that allows users to query text data ad hoc in a manner similar to querying structured data in a CDW. While search engines just return text snippets, our systems also returns frequencies (e.g. how many patients exist with “heart failure” including textual synonyms or how many patients have an LVEF < 45) based on the content of discharge letters or textual reports for special investigations like heart echo. Three subtasks are addressed: (1) To recognize and to exclude negations and their scopes, (2) to extract concepts, i.e. Boolean values and (3) to extract numerical values. Methods: We implemented an extended version of the NegEx-algorithm for German texts that detects negations and determines their scope. Furthermore, our document oriented CDW PaDaWaN was extended with query functions, e.g. context sensitive queries and regex queries, and an extraction mode for computing the frequencies for Boolean and numerical values. Results: Evaluations in chest X-ray reports and in discharge letters showed high F1-scores for the three subtasks: Detection of negated concepts in chest X-ray reports with an F1-score of 0.99 and in discharge letters with 0.97; of Boolean values in chest X-ray reports about 0.99, and of numerical values in chest X-ray reports and discharge letters also around 0.99 with the exception of the concept age. Discussion: The advantages of an ad hoc IE over a standard IE are the low development effort (just entering the concept with its variants), the promptness of the results and the adaptability by the user to his or her particular question. Disadvantage are usually lower accuracy and confidence. This ad hoc information extraction approach is novel and exceeds existing systems: Roogle [ 1 ] extracts predefined concepts from texts at preprocessing and makes them retrievable at runtime. Dr. Warehouse [ 2 ] applies negation detection and indexes the produced subtexts which include affirmed findings. Our approach combines negation detection and the extraction of concepts. But the extraction does not take place during preprocessing, but at runtime. That provides an ad hoc, dynamic, interactive and adjustable information extraction of random concepts and even their values on the fly at runtime. Conclusions: We developed an ad hoc information extraction query feature for Boolean and numerical values within a CDW with high recall and precision based on a pipeline that detects and removes negations and their scope in clinical texts. Schattauer GmbH 2018-05 2018-05-25 /pmc/articles/PMC6193399/ /pubmed/29801178 http://dx.doi.org/10.3414/ME17-02-0010 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License, which permits unrestricted reproduction and distribution, for non-commercial purposes only; and use and reproduction, but not distribution, of adapted material for non-commercial purposes only, provided the original work is properly cited.
spellingShingle Dietrich, Georg
Krebs, Jonathan
Fette, Georg
Ertl, Maximilian
Kaspar, Mathias
Störk, Stefan
Puppe, Frank
Ad Hoc Information Extraction for Clinical Data Warehouses
title Ad Hoc Information Extraction for Clinical Data Warehouses
title_full Ad Hoc Information Extraction for Clinical Data Warehouses
title_fullStr Ad Hoc Information Extraction for Clinical Data Warehouses
title_full_unstemmed Ad Hoc Information Extraction for Clinical Data Warehouses
title_short Ad Hoc Information Extraction for Clinical Data Warehouses
title_sort ad hoc information extraction for clinical data warehouses
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6193399/
https://www.ncbi.nlm.nih.gov/pubmed/29801178
http://dx.doi.org/10.3414/ME17-02-0010
work_keys_str_mv AT dietrichgeorg adhocinformationextractionforclinicaldatawarehouses
AT krebsjonathan adhocinformationextractionforclinicaldatawarehouses
AT fettegeorg adhocinformationextractionforclinicaldatawarehouses
AT ertlmaximilian adhocinformationextractionforclinicaldatawarehouses
AT kasparmathias adhocinformationextractionforclinicaldatawarehouses
AT storkstefan adhocinformationextractionforclinicaldatawarehouses
AT puppefrank adhocinformationextractionforclinicaldatawarehouses