Cargando…

Information extraction from German radiological reports for general clinical text and language understanding

Recent advances in deep learning and natural language processing (NLP) have opened many new opportunities for automatic text understanding and text processing in the medical field. This is of great benefit as many clinical downstream tasks rely on information from unstructured clinical documents. Ho...

Descripción completa

Detalles Bibliográficos
Autores principales: Jantscher, Michael, Gunzer, Felix, Kern, Roman, Hassler, Eva, Tschauner, Sebastian, Reishofer, Gernot
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9911592/
https://www.ncbi.nlm.nih.gov/pubmed/36759679
http://dx.doi.org/10.1038/s41598-023-29323-3
_version_ 1784885019787919360
author Jantscher, Michael
Gunzer, Felix
Kern, Roman
Hassler, Eva
Tschauner, Sebastian
Reishofer, Gernot
author_facet Jantscher, Michael
Gunzer, Felix
Kern, Roman
Hassler, Eva
Tschauner, Sebastian
Reishofer, Gernot
author_sort Jantscher, Michael
collection PubMed
description Recent advances in deep learning and natural language processing (NLP) have opened many new opportunities for automatic text understanding and text processing in the medical field. This is of great benefit as many clinical downstream tasks rely on information from unstructured clinical documents. However, for low-resource languages like German, the use of modern text processing applications that require a large amount of training data proves to be difficult, as only few data sets are available mainly due to legal restrictions. In this study, we present an information extraction framework that was initially pre-trained on real-world computed tomographic (CT) reports of head examinations, followed by domain adaptive fine-tuning on reports from different imaging examinations. We show that in the pre-training phase, the semantic and contextual meaning of one clinical reporting domain can be captured and effectively transferred to foreign clinical imaging examinations. Moreover, we introduce an active learning approach with an intrinsic strategic sampling method to generate highly informative training data with low human annotation cost. We see that the model performance can be significantly improved by an appropriate selection of the data to be annotated, without the need to train the model on a specific downstream task. With a general annotation scheme that can be used not only in the radiology field but also in a broader clinical setting, we contribute to a more consistent labeling and annotation process that also facilitates the verification and evaluation of language models in the German clinical setting.
format Online
Article
Text
id pubmed-9911592
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-99115922023-02-11 Information extraction from German radiological reports for general clinical text and language understanding Jantscher, Michael Gunzer, Felix Kern, Roman Hassler, Eva Tschauner, Sebastian Reishofer, Gernot Sci Rep Article Recent advances in deep learning and natural language processing (NLP) have opened many new opportunities for automatic text understanding and text processing in the medical field. This is of great benefit as many clinical downstream tasks rely on information from unstructured clinical documents. However, for low-resource languages like German, the use of modern text processing applications that require a large amount of training data proves to be difficult, as only few data sets are available mainly due to legal restrictions. In this study, we present an information extraction framework that was initially pre-trained on real-world computed tomographic (CT) reports of head examinations, followed by domain adaptive fine-tuning on reports from different imaging examinations. We show that in the pre-training phase, the semantic and contextual meaning of one clinical reporting domain can be captured and effectively transferred to foreign clinical imaging examinations. Moreover, we introduce an active learning approach with an intrinsic strategic sampling method to generate highly informative training data with low human annotation cost. We see that the model performance can be significantly improved by an appropriate selection of the data to be annotated, without the need to train the model on a specific downstream task. With a general annotation scheme that can be used not only in the radiology field but also in a broader clinical setting, we contribute to a more consistent labeling and annotation process that also facilitates the verification and evaluation of language models in the German clinical setting. Nature Publishing Group UK 2023-02-09 /pmc/articles/PMC9911592/ /pubmed/36759679 http://dx.doi.org/10.1038/s41598-023-29323-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Jantscher, Michael
Gunzer, Felix
Kern, Roman
Hassler, Eva
Tschauner, Sebastian
Reishofer, Gernot
Information extraction from German radiological reports for general clinical text and language understanding
title Information extraction from German radiological reports for general clinical text and language understanding
title_full Information extraction from German radiological reports for general clinical text and language understanding
title_fullStr Information extraction from German radiological reports for general clinical text and language understanding
title_full_unstemmed Information extraction from German radiological reports for general clinical text and language understanding
title_short Information extraction from German radiological reports for general clinical text and language understanding
title_sort information extraction from german radiological reports for general clinical text and language understanding
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9911592/
https://www.ncbi.nlm.nih.gov/pubmed/36759679
http://dx.doi.org/10.1038/s41598-023-29323-3
work_keys_str_mv AT jantschermichael informationextractionfromgermanradiologicalreportsforgeneralclinicaltextandlanguageunderstanding
AT gunzerfelix informationextractionfromgermanradiologicalreportsforgeneralclinicaltextandlanguageunderstanding
AT kernroman informationextractionfromgermanradiologicalreportsforgeneralclinicaltextandlanguageunderstanding
AT hasslereva informationextractionfromgermanradiologicalreportsforgeneralclinicaltextandlanguageunderstanding
AT tschaunersebastian informationextractionfromgermanradiologicalreportsforgeneralclinicaltextandlanguageunderstanding
AT reishofergernot informationextractionfromgermanradiologicalreportsforgeneralclinicaltextandlanguageunderstanding