Cargando…

Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models

OBJECTIVE: A vast amount of medical data is still stored in unstructured text documents. We present an automated method of information extraction from German unstructured clinical routine data from the cardiology domain enabling their usage in state-of-the-art data-driven deep learning projects. MET...

Descripción completa

Detalles Bibliográficos
Autores principales:	Richter-Pechanski, Phillip, Geis, Nicolas A, Kiriakou, Christina, Schwab, Dominic M, Dieterich, Christoph
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	SAGE Publications 2021
Materias:	Original Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8637713/ https://www.ncbi.nlm.nih.gov/pubmed/34868618 http://dx.doi.org/10.1177/20552076211057662

_version_	1784608801059504128
author	Richter-Pechanski, Phillip Geis, Nicolas A Kiriakou, Christina Schwab, Dominic M Dieterich, Christoph
author_facet	Richter-Pechanski, Phillip Geis, Nicolas A Kiriakou, Christina Schwab, Dominic M Dieterich, Christoph
author_sort	Richter-Pechanski, Phillip
collection	PubMed
description	OBJECTIVE: A vast amount of medical data is still stored in unstructured text documents. We present an automated method of information extraction from German unstructured clinical routine data from the cardiology domain enabling their usage in state-of-the-art data-driven deep learning projects. METHODS: We evaluated pre-trained language models to extract a set of 12 cardiovascular concepts in German discharge letters. We compared three bidirectional encoder representations from transformers pre-trained on different corpora and fine-tuned them on the task of cardiovascular concept extraction using 204 discharge letters manually annotated by cardiologists at the University Hospital Heidelberg. We compared our results with traditional machine learning methods based on a long short-term memory network and a conditional random field. RESULTS: Our best performing model, based on publicly available German pre-trained bidirectional encoder representations from the transformer model, achieved a token-wise micro-average F1-score of 86% and outperformed the baseline by at least 6%. Moreover, this approach achieved the best trade-off between precision (positive predictive value) and recall (sensitivity). CONCLUSION: Our results show the applicability of state-of-the-art deep learning methods using pre-trained language models for the task of cardiovascular concept extraction using limited training data. This minimizes annotation efforts, which are currently the bottleneck of any application of data-driven deep learning projects in the clinical domain for German and many other European languages.
format	Online Article Text
id	pubmed-8637713
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	SAGE Publications
record_format	MEDLINE/PubMed
spelling	pubmed-86377132021-12-03 Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models Richter-Pechanski, Phillip Geis, Nicolas A Kiriakou, Christina Schwab, Dominic M Dieterich, Christoph Digit Health Original Research OBJECTIVE: A vast amount of medical data is still stored in unstructured text documents. We present an automated method of information extraction from German unstructured clinical routine data from the cardiology domain enabling their usage in state-of-the-art data-driven deep learning projects. METHODS: We evaluated pre-trained language models to extract a set of 12 cardiovascular concepts in German discharge letters. We compared three bidirectional encoder representations from transformers pre-trained on different corpora and fine-tuned them on the task of cardiovascular concept extraction using 204 discharge letters manually annotated by cardiologists at the University Hospital Heidelberg. We compared our results with traditional machine learning methods based on a long short-term memory network and a conditional random field. RESULTS: Our best performing model, based on publicly available German pre-trained bidirectional encoder representations from the transformer model, achieved a token-wise micro-average F1-score of 86% and outperformed the baseline by at least 6%. Moreover, this approach achieved the best trade-off between precision (positive predictive value) and recall (sensitivity). CONCLUSION: Our results show the applicability of state-of-the-art deep learning methods using pre-trained language models for the task of cardiovascular concept extraction using limited training data. This minimizes annotation efforts, which are currently the bottleneck of any application of data-driven deep learning projects in the clinical domain for German and many other European languages. SAGE Publications 2021-11-26 /pmc/articles/PMC8637713/ /pubmed/34868618 http://dx.doi.org/10.1177/20552076211057662 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle	Original Research Richter-Pechanski, Phillip Geis, Nicolas A Kiriakou, Christina Schwab, Dominic M Dieterich, Christoph Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
title	Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
title_full	Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
title_fullStr	Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
title_full_unstemmed	Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
title_short	Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
title_sort	automatic extraction of 12 cardiovascular concepts from german discharge letters using pre-trained language models
topic	Original Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8637713/ https://www.ncbi.nlm.nih.gov/pubmed/34868618 http://dx.doi.org/10.1177/20552076211057662
work_keys_str_mv	AT richterpechanskiphillip automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels AT geisnicolasa automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels AT kiriakouchristina automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels AT schwabdominicm automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels AT dieterichchristoph automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels

Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models

Ejemplares similares