Cargando…

Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models

OBJECTIVE: A vast amount of medical data is still stored in unstructured text documents. We present an automated method of information extraction from German unstructured clinical routine data from the cardiology domain enabling their usage in state-of-the-art data-driven deep learning projects. MET...

Descripción completa

Detalles Bibliográficos
Autores principales: Richter-Pechanski, Phillip, Geis, Nicolas A, Kiriakou, Christina, Schwab, Dominic M, Dieterich, Christoph
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8637713/
https://www.ncbi.nlm.nih.gov/pubmed/34868618
http://dx.doi.org/10.1177/20552076211057662
_version_ 1784608801059504128
author Richter-Pechanski, Phillip
Geis, Nicolas A
Kiriakou, Christina
Schwab, Dominic M
Dieterich, Christoph
author_facet Richter-Pechanski, Phillip
Geis, Nicolas A
Kiriakou, Christina
Schwab, Dominic M
Dieterich, Christoph
author_sort Richter-Pechanski, Phillip
collection PubMed
description OBJECTIVE: A vast amount of medical data is still stored in unstructured text documents. We present an automated method of information extraction from German unstructured clinical routine data from the cardiology domain enabling their usage in state-of-the-art data-driven deep learning projects. METHODS: We evaluated pre-trained language models to extract a set of 12 cardiovascular concepts in German discharge letters. We compared three bidirectional encoder representations from transformers pre-trained on different corpora and fine-tuned them on the task of cardiovascular concept extraction using 204 discharge letters manually annotated by cardiologists at the University Hospital Heidelberg. We compared our results with traditional machine learning methods based on a long short-term memory network and a conditional random field. RESULTS: Our best performing model, based on publicly available German pre-trained bidirectional encoder representations from the transformer model, achieved a token-wise micro-average F1-score of 86% and outperformed the baseline by at least 6%. Moreover, this approach achieved the best trade-off between precision (positive predictive value) and recall (sensitivity). CONCLUSION: Our results show the applicability of state-of-the-art deep learning methods using pre-trained language models for the task of cardiovascular concept extraction using limited training data. This minimizes annotation efforts, which are currently the bottleneck of any application of data-driven deep learning projects in the clinical domain for German and many other European languages.
format Online
Article
Text
id pubmed-8637713
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-86377132021-12-03 Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models Richter-Pechanski, Phillip Geis, Nicolas A Kiriakou, Christina Schwab, Dominic M Dieterich, Christoph Digit Health Original Research OBJECTIVE: A vast amount of medical data is still stored in unstructured text documents. We present an automated method of information extraction from German unstructured clinical routine data from the cardiology domain enabling their usage in state-of-the-art data-driven deep learning projects. METHODS: We evaluated pre-trained language models to extract a set of 12 cardiovascular concepts in German discharge letters. We compared three bidirectional encoder representations from transformers pre-trained on different corpora and fine-tuned them on the task of cardiovascular concept extraction using 204 discharge letters manually annotated by cardiologists at the University Hospital Heidelberg. We compared our results with traditional machine learning methods based on a long short-term memory network and a conditional random field. RESULTS: Our best performing model, based on publicly available German pre-trained bidirectional encoder representations from the transformer model, achieved a token-wise micro-average F1-score of 86% and outperformed the baseline by at least 6%. Moreover, this approach achieved the best trade-off between precision (positive predictive value) and recall (sensitivity). CONCLUSION: Our results show the applicability of state-of-the-art deep learning methods using pre-trained language models for the task of cardiovascular concept extraction using limited training data. This minimizes annotation efforts, which are currently the bottleneck of any application of data-driven deep learning projects in the clinical domain for German and many other European languages. SAGE Publications 2021-11-26 /pmc/articles/PMC8637713/ /pubmed/34868618 http://dx.doi.org/10.1177/20552076211057662 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Original Research
Richter-Pechanski, Phillip
Geis, Nicolas A
Kiriakou, Christina
Schwab, Dominic M
Dieterich, Christoph
Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
title Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
title_full Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
title_fullStr Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
title_full_unstemmed Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
title_short Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
title_sort automatic extraction of 12 cardiovascular concepts from german discharge letters using pre-trained language models
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8637713/
https://www.ncbi.nlm.nih.gov/pubmed/34868618
http://dx.doi.org/10.1177/20552076211057662
work_keys_str_mv AT richterpechanskiphillip automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels
AT geisnicolasa automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels
AT kiriakouchristina automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels
AT schwabdominicm automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels
AT dieterichchristoph automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels