Cargando…
Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
OBJECTIVE: A vast amount of medical data is still stored in unstructured text documents. We present an automated method of information extraction from German unstructured clinical routine data from the cardiology domain enabling their usage in state-of-the-art data-driven deep learning projects. MET...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
SAGE Publications
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8637713/ https://www.ncbi.nlm.nih.gov/pubmed/34868618 http://dx.doi.org/10.1177/20552076211057662 |
_version_ | 1784608801059504128 |
---|---|
author | Richter-Pechanski, Phillip Geis, Nicolas A Kiriakou, Christina Schwab, Dominic M Dieterich, Christoph |
author_facet | Richter-Pechanski, Phillip Geis, Nicolas A Kiriakou, Christina Schwab, Dominic M Dieterich, Christoph |
author_sort | Richter-Pechanski, Phillip |
collection | PubMed |
description | OBJECTIVE: A vast amount of medical data is still stored in unstructured text documents. We present an automated method of information extraction from German unstructured clinical routine data from the cardiology domain enabling their usage in state-of-the-art data-driven deep learning projects. METHODS: We evaluated pre-trained language models to extract a set of 12 cardiovascular concepts in German discharge letters. We compared three bidirectional encoder representations from transformers pre-trained on different corpora and fine-tuned them on the task of cardiovascular concept extraction using 204 discharge letters manually annotated by cardiologists at the University Hospital Heidelberg. We compared our results with traditional machine learning methods based on a long short-term memory network and a conditional random field. RESULTS: Our best performing model, based on publicly available German pre-trained bidirectional encoder representations from the transformer model, achieved a token-wise micro-average F1-score of 86% and outperformed the baseline by at least 6%. Moreover, this approach achieved the best trade-off between precision (positive predictive value) and recall (sensitivity). CONCLUSION: Our results show the applicability of state-of-the-art deep learning methods using pre-trained language models for the task of cardiovascular concept extraction using limited training data. This minimizes annotation efforts, which are currently the bottleneck of any application of data-driven deep learning projects in the clinical domain for German and many other European languages. |
format | Online Article Text |
id | pubmed-8637713 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | SAGE Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-86377132021-12-03 Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models Richter-Pechanski, Phillip Geis, Nicolas A Kiriakou, Christina Schwab, Dominic M Dieterich, Christoph Digit Health Original Research OBJECTIVE: A vast amount of medical data is still stored in unstructured text documents. We present an automated method of information extraction from German unstructured clinical routine data from the cardiology domain enabling their usage in state-of-the-art data-driven deep learning projects. METHODS: We evaluated pre-trained language models to extract a set of 12 cardiovascular concepts in German discharge letters. We compared three bidirectional encoder representations from transformers pre-trained on different corpora and fine-tuned them on the task of cardiovascular concept extraction using 204 discharge letters manually annotated by cardiologists at the University Hospital Heidelberg. We compared our results with traditional machine learning methods based on a long short-term memory network and a conditional random field. RESULTS: Our best performing model, based on publicly available German pre-trained bidirectional encoder representations from the transformer model, achieved a token-wise micro-average F1-score of 86% and outperformed the baseline by at least 6%. Moreover, this approach achieved the best trade-off between precision (positive predictive value) and recall (sensitivity). CONCLUSION: Our results show the applicability of state-of-the-art deep learning methods using pre-trained language models for the task of cardiovascular concept extraction using limited training data. This minimizes annotation efforts, which are currently the bottleneck of any application of data-driven deep learning projects in the clinical domain for German and many other European languages. SAGE Publications 2021-11-26 /pmc/articles/PMC8637713/ /pubmed/34868618 http://dx.doi.org/10.1177/20552076211057662 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage). |
spellingShingle | Original Research Richter-Pechanski, Phillip Geis, Nicolas A Kiriakou, Christina Schwab, Dominic M Dieterich, Christoph Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models |
title | Automatic extraction of 12 cardiovascular concepts from German
discharge letters using pre-trained language models |
title_full | Automatic extraction of 12 cardiovascular concepts from German
discharge letters using pre-trained language models |
title_fullStr | Automatic extraction of 12 cardiovascular concepts from German
discharge letters using pre-trained language models |
title_full_unstemmed | Automatic extraction of 12 cardiovascular concepts from German
discharge letters using pre-trained language models |
title_short | Automatic extraction of 12 cardiovascular concepts from German
discharge letters using pre-trained language models |
title_sort | automatic extraction of 12 cardiovascular concepts from german
discharge letters using pre-trained language models |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8637713/ https://www.ncbi.nlm.nih.gov/pubmed/34868618 http://dx.doi.org/10.1177/20552076211057662 |
work_keys_str_mv | AT richterpechanskiphillip automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels AT geisnicolasa automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels AT kiriakouchristina automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels AT schwabdominicm automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels AT dieterichchristoph automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels |