Cargando…
Information extraction from German radiological reports for general clinical text and language understanding
Recent advances in deep learning and natural language processing (NLP) have opened many new opportunities for automatic text understanding and text processing in the medical field. This is of great benefit as many clinical downstream tasks rely on information from unstructured clinical documents. Ho...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9911592/ https://www.ncbi.nlm.nih.gov/pubmed/36759679 http://dx.doi.org/10.1038/s41598-023-29323-3 |
_version_ | 1784885019787919360 |
---|---|
author | Jantscher, Michael Gunzer, Felix Kern, Roman Hassler, Eva Tschauner, Sebastian Reishofer, Gernot |
author_facet | Jantscher, Michael Gunzer, Felix Kern, Roman Hassler, Eva Tschauner, Sebastian Reishofer, Gernot |
author_sort | Jantscher, Michael |
collection | PubMed |
description | Recent advances in deep learning and natural language processing (NLP) have opened many new opportunities for automatic text understanding and text processing in the medical field. This is of great benefit as many clinical downstream tasks rely on information from unstructured clinical documents. However, for low-resource languages like German, the use of modern text processing applications that require a large amount of training data proves to be difficult, as only few data sets are available mainly due to legal restrictions. In this study, we present an information extraction framework that was initially pre-trained on real-world computed tomographic (CT) reports of head examinations, followed by domain adaptive fine-tuning on reports from different imaging examinations. We show that in the pre-training phase, the semantic and contextual meaning of one clinical reporting domain can be captured and effectively transferred to foreign clinical imaging examinations. Moreover, we introduce an active learning approach with an intrinsic strategic sampling method to generate highly informative training data with low human annotation cost. We see that the model performance can be significantly improved by an appropriate selection of the data to be annotated, without the need to train the model on a specific downstream task. With a general annotation scheme that can be used not only in the radiology field but also in a broader clinical setting, we contribute to a more consistent labeling and annotation process that also facilitates the verification and evaluation of language models in the German clinical setting. |
format | Online Article Text |
id | pubmed-9911592 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-99115922023-02-11 Information extraction from German radiological reports for general clinical text and language understanding Jantscher, Michael Gunzer, Felix Kern, Roman Hassler, Eva Tschauner, Sebastian Reishofer, Gernot Sci Rep Article Recent advances in deep learning and natural language processing (NLP) have opened many new opportunities for automatic text understanding and text processing in the medical field. This is of great benefit as many clinical downstream tasks rely on information from unstructured clinical documents. However, for low-resource languages like German, the use of modern text processing applications that require a large amount of training data proves to be difficult, as only few data sets are available mainly due to legal restrictions. In this study, we present an information extraction framework that was initially pre-trained on real-world computed tomographic (CT) reports of head examinations, followed by domain adaptive fine-tuning on reports from different imaging examinations. We show that in the pre-training phase, the semantic and contextual meaning of one clinical reporting domain can be captured and effectively transferred to foreign clinical imaging examinations. Moreover, we introduce an active learning approach with an intrinsic strategic sampling method to generate highly informative training data with low human annotation cost. We see that the model performance can be significantly improved by an appropriate selection of the data to be annotated, without the need to train the model on a specific downstream task. With a general annotation scheme that can be used not only in the radiology field but also in a broader clinical setting, we contribute to a more consistent labeling and annotation process that also facilitates the verification and evaluation of language models in the German clinical setting. Nature Publishing Group UK 2023-02-09 /pmc/articles/PMC9911592/ /pubmed/36759679 http://dx.doi.org/10.1038/s41598-023-29323-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Jantscher, Michael Gunzer, Felix Kern, Roman Hassler, Eva Tschauner, Sebastian Reishofer, Gernot Information extraction from German radiological reports for general clinical text and language understanding |
title | Information extraction from German radiological reports for general clinical text and language understanding |
title_full | Information extraction from German radiological reports for general clinical text and language understanding |
title_fullStr | Information extraction from German radiological reports for general clinical text and language understanding |
title_full_unstemmed | Information extraction from German radiological reports for general clinical text and language understanding |
title_short | Information extraction from German radiological reports for general clinical text and language understanding |
title_sort | information extraction from german radiological reports for general clinical text and language understanding |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9911592/ https://www.ncbi.nlm.nih.gov/pubmed/36759679 http://dx.doi.org/10.1038/s41598-023-29323-3 |
work_keys_str_mv | AT jantschermichael informationextractionfromgermanradiologicalreportsforgeneralclinicaltextandlanguageunderstanding AT gunzerfelix informationextractionfromgermanradiologicalreportsforgeneralclinicaltextandlanguageunderstanding AT kernroman informationextractionfromgermanradiologicalreportsforgeneralclinicaltextandlanguageunderstanding AT hasslereva informationextractionfromgermanradiologicalreportsforgeneralclinicaltextandlanguageunderstanding AT tschaunersebastian informationextractionfromgermanradiologicalreportsforgeneralclinicaltextandlanguageunderstanding AT reishofergernot informationextractionfromgermanradiologicalreportsforgeneralclinicaltextandlanguageunderstanding |