Cargando…

Fine-grained information extraction from German transthoracic echocardiography reports

BACKGROUND: Information extraction techniques that get structured representations out of unstructured data make a large amount of clinically relevant information about patients accessible for semantic applications. These methods typically rely on standardized terminologies that guide this process. M...

Descripción completa

Detalles Bibliográficos
Autores principales: Toepfer, Martin, Corovic, Hamo, Fette, Georg, Klügl, Peter, Störk, Stefan, Puppe, Frank
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4643516/
https://www.ncbi.nlm.nih.gov/pubmed/26563260
http://dx.doi.org/10.1186/s12911-015-0215-x
_version_ 1782400527825371136
author Toepfer, Martin
Corovic, Hamo
Fette, Georg
Klügl, Peter
Störk, Stefan
Puppe, Frank
author_facet Toepfer, Martin
Corovic, Hamo
Fette, Georg
Klügl, Peter
Störk, Stefan
Puppe, Frank
author_sort Toepfer, Martin
collection PubMed
description BACKGROUND: Information extraction techniques that get structured representations out of unstructured data make a large amount of clinically relevant information about patients accessible for semantic applications. These methods typically rely on standardized terminologies that guide this process. Many languages and clinical domains, however, lack appropriate resources and tools, as well as evaluations of their applications, especially if detailed conceptualizations of the domain are required. For instance, German transthoracic echocardiography reports have not been targeted sufficiently before, despite of their importance for clinical trials. This work therefore aimed at development and evaluation of an information extraction component with a fine-grained terminology that enables to recognize almost all relevant information stated in German transthoracic echocardiography reports at the University Hospital of Würzburg. METHODS: A domain expert validated and iteratively refined an automatically inferred base terminology. The terminology was used by an ontology-driven information extraction system that outputs attribute value pairs. The final component has been mapped to the central elements of a standardized terminology, and it has been evaluated according to documents with different layouts. RESULTS: The final system achieved state-of-the-art precision (micro average.996) and recall (micro average.961) on 100 test documents that represent more than 90 % of all reports. In particular, principal aspects as defined in a standardized external terminology were recognized with f(1)=.989 (micro average) and f(1)=.963 (macro average). As a result of keyword matching and restraint concept extraction, the system obtained high precision also on unstructured or exceptionally short documents, and documents with uncommon layout. CONCLUSIONS: The developed terminology and the proposed information extraction system allow to extract fine-grained information from German semi-structured transthoracic echocardiography reports with very high precision and high recall on the majority of documents at the University Hospital of Würzburg. Extracted results populate a clinical data warehouse which supports clinical research. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12911-015-0215-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4643516
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46435162015-11-14 Fine-grained information extraction from German transthoracic echocardiography reports Toepfer, Martin Corovic, Hamo Fette, Georg Klügl, Peter Störk, Stefan Puppe, Frank BMC Med Inform Decis Mak Research Article BACKGROUND: Information extraction techniques that get structured representations out of unstructured data make a large amount of clinically relevant information about patients accessible for semantic applications. These methods typically rely on standardized terminologies that guide this process. Many languages and clinical domains, however, lack appropriate resources and tools, as well as evaluations of their applications, especially if detailed conceptualizations of the domain are required. For instance, German transthoracic echocardiography reports have not been targeted sufficiently before, despite of their importance for clinical trials. This work therefore aimed at development and evaluation of an information extraction component with a fine-grained terminology that enables to recognize almost all relevant information stated in German transthoracic echocardiography reports at the University Hospital of Würzburg. METHODS: A domain expert validated and iteratively refined an automatically inferred base terminology. The terminology was used by an ontology-driven information extraction system that outputs attribute value pairs. The final component has been mapped to the central elements of a standardized terminology, and it has been evaluated according to documents with different layouts. RESULTS: The final system achieved state-of-the-art precision (micro average.996) and recall (micro average.961) on 100 test documents that represent more than 90 % of all reports. In particular, principal aspects as defined in a standardized external terminology were recognized with f(1)=.989 (micro average) and f(1)=.963 (macro average). As a result of keyword matching and restraint concept extraction, the system obtained high precision also on unstructured or exceptionally short documents, and documents with uncommon layout. CONCLUSIONS: The developed terminology and the proposed information extraction system allow to extract fine-grained information from German semi-structured transthoracic echocardiography reports with very high precision and high recall on the majority of documents at the University Hospital of Würzburg. Extracted results populate a clinical data warehouse which supports clinical research. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12911-015-0215-x) contains supplementary material, which is available to authorized users. BioMed Central 2015-11-12 /pmc/articles/PMC4643516/ /pubmed/26563260 http://dx.doi.org/10.1186/s12911-015-0215-x Text en © Toepfer et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Toepfer, Martin
Corovic, Hamo
Fette, Georg
Klügl, Peter
Störk, Stefan
Puppe, Frank
Fine-grained information extraction from German transthoracic echocardiography reports
title Fine-grained information extraction from German transthoracic echocardiography reports
title_full Fine-grained information extraction from German transthoracic echocardiography reports
title_fullStr Fine-grained information extraction from German transthoracic echocardiography reports
title_full_unstemmed Fine-grained information extraction from German transthoracic echocardiography reports
title_short Fine-grained information extraction from German transthoracic echocardiography reports
title_sort fine-grained information extraction from german transthoracic echocardiography reports
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4643516/
https://www.ncbi.nlm.nih.gov/pubmed/26563260
http://dx.doi.org/10.1186/s12911-015-0215-x
work_keys_str_mv AT toepfermartin finegrainedinformationextractionfromgermantransthoracicechocardiographyreports
AT corovichamo finegrainedinformationextractionfromgermantransthoracicechocardiographyreports
AT fettegeorg finegrainedinformationextractionfromgermantransthoracicechocardiographyreports
AT kluglpeter finegrainedinformationextractionfromgermantransthoracicechocardiographyreports
AT storkstefan finegrainedinformationextractionfromgermantransthoracicechocardiographyreports
AT puppefrank finegrainedinformationextractionfromgermantransthoracicechocardiographyreports