Cargando…

Information extraction from German radiological reports for general clinical text and language understanding

Recent advances in deep learning and natural language processing (NLP) have opened many new opportunities for automatic text understanding and text processing in the medical field. This is of great benefit as many clinical downstream tasks rely on information from unstructured clinical documents. Ho...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jantscher, Michael, Gunzer, Felix, Kern, Roman, Hassler, Eva, Tschauner, Sebastian, Reishofer, Gernot
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9911592/ https://www.ncbi.nlm.nih.gov/pubmed/36759679 http://dx.doi.org/10.1038/s41598-023-29323-3

_version_	1784885019787919360
author	Jantscher, Michael Gunzer, Felix Kern, Roman Hassler, Eva Tschauner, Sebastian Reishofer, Gernot
author_facet	Jantscher, Michael Gunzer, Felix Kern, Roman Hassler, Eva Tschauner, Sebastian Reishofer, Gernot
author_sort	Jantscher, Michael
collection	PubMed
description	Recent advances in deep learning and natural language processing (NLP) have opened many new opportunities for automatic text understanding and text processing in the medical field. This is of great benefit as many clinical downstream tasks rely on information from unstructured clinical documents. However, for low-resource languages like German, the use of modern text processing applications that require a large amount of training data proves to be difficult, as only few data sets are available mainly due to legal restrictions. In this study, we present an information extraction framework that was initially pre-trained on real-world computed tomographic (CT) reports of head examinations, followed by domain adaptive fine-tuning on reports from different imaging examinations. We show that in the pre-training phase, the semantic and contextual meaning of one clinical reporting domain can be captured and effectively transferred to foreign clinical imaging examinations. Moreover, we introduce an active learning approach with an intrinsic strategic sampling method to generate highly informative training data with low human annotation cost. We see that the model performance can be significantly improved by an appropriate selection of the data to be annotated, without the need to train the model on a specific downstream task. With a general annotation scheme that can be used not only in the radiology field but also in a broader clinical setting, we contribute to a more consistent labeling and annotation process that also facilitates the verification and evaluation of language models in the German clinical setting.
format	Online Article Text
id	pubmed-9911592
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-99115922023-02-11 Information extraction from German radiological reports for general clinical text and language understanding Jantscher, Michael Gunzer, Felix Kern, Roman Hassler, Eva Tschauner, Sebastian Reishofer, Gernot Sci Rep Article Recent advances in deep learning and natural language processing (NLP) have opened many new opportunities for automatic text understanding and text processing in the medical field. This is of great benefit as many clinical downstream tasks rely on information from unstructured clinical documents. However, for low-resource languages like German, the use of modern text processing applications that require a large amount of training data proves to be difficult, as only few data sets are available mainly due to legal restrictions. In this study, we present an information extraction framework that was initially pre-trained on real-world computed tomographic (CT) reports of head examinations, followed by domain adaptive fine-tuning on reports from different imaging examinations. We show that in the pre-training phase, the semantic and contextual meaning of one clinical reporting domain can be captured and effectively transferred to foreign clinical imaging examinations. Moreover, we introduce an active learning approach with an intrinsic strategic sampling method to generate highly informative training data with low human annotation cost. We see that the model performance can be significantly improved by an appropriate selection of the data to be annotated, without the need to train the model on a specific downstream task. With a general annotation scheme that can be used not only in the radiology field but also in a broader clinical setting, we contribute to a more consistent labeling and annotation process that also facilitates the verification and evaluation of language models in the German clinical setting. Nature Publishing Group UK 2023-02-09 /pmc/articles/PMC9911592/ /pubmed/36759679 http://dx.doi.org/10.1038/s41598-023-29323-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Jantscher, Michael Gunzer, Felix Kern, Roman Hassler, Eva Tschauner, Sebastian Reishofer, Gernot Information extraction from German radiological reports for general clinical text and language understanding
title	Information extraction from German radiological reports for general clinical text and language understanding
title_full	Information extraction from German radiological reports for general clinical text and language understanding
title_fullStr	Information extraction from German radiological reports for general clinical text and language understanding
title_full_unstemmed	Information extraction from German radiological reports for general clinical text and language understanding
title_short	Information extraction from German radiological reports for general clinical text and language understanding
title_sort	information extraction from german radiological reports for general clinical text and language understanding
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9911592/ https://www.ncbi.nlm.nih.gov/pubmed/36759679 http://dx.doi.org/10.1038/s41598-023-29323-3
work_keys_str_mv	AT jantschermichael informationextractionfromgermanradiologicalreportsforgeneralclinicaltextandlanguageunderstanding AT gunzerfelix informationextractionfromgermanradiologicalreportsforgeneralclinicaltextandlanguageunderstanding AT kernroman informationextractionfromgermanradiologicalreportsforgeneralclinicaltextandlanguageunderstanding AT hasslereva informationextractionfromgermanradiologicalreportsforgeneralclinicaltextandlanguageunderstanding AT tschaunersebastian informationextractionfromgermanradiologicalreportsforgeneralclinicaltextandlanguageunderstanding AT reishofergernot informationextractionfromgermanradiologicalreportsforgeneralclinicaltextandlanguageunderstanding

Information extraction from German radiological reports for general clinical text and language understanding

Ejemplares similares