Cargando…

Deep Learning-Based Natural Language Processing in Radiology: The Impact of Report Complexity, Disease Prevalence, Dataset Size, and Algorithm Type on Model Performance

In radiology, natural language processing (NLP) allows the extraction of valuable information from radiology reports. It can be used for various downstream tasks such as quality improvement, epidemiological research, and monitoring guideline adherence. Class imbalance, variation in dataset size, var...

Descripción completa

Detalles Bibliográficos
Autores principales:	Olthof, A. W., van Ooijen, P. M. A., Cornelissen, L. J.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer US 2021
Materias:	Systems-Level Quality Improvement
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8416876/ https://www.ncbi.nlm.nih.gov/pubmed/34480231 http://dx.doi.org/10.1007/s10916-021-01761-4

_version_	1783748271758376960
author	Olthof, A. W. van Ooijen, P. M. A. Cornelissen, L. J.
author_facet	Olthof, A. W. van Ooijen, P. M. A. Cornelissen, L. J.
author_sort	Olthof, A. W.
collection	PubMed
description	In radiology, natural language processing (NLP) allows the extraction of valuable information from radiology reports. It can be used for various downstream tasks such as quality improvement, epidemiological research, and monitoring guideline adherence. Class imbalance, variation in dataset size, variation in report complexity, and algorithm type all influence NLP performance but have not yet been systematically and interrelatedly evaluated. In this study, we investigate these factors on the performance of four types [a fully connected neural network (Dense), a long short-term memory recurrent neural network (LSTM), a convolutional neural network (CNN), and a Bidirectional Encoder Representations from Transformers (BERT)] of deep learning-based NLP. Two datasets consisting of radiologist-annotated reports of both trauma radiographs (n = 2469) and chest radiographs and computer tomography (CT) studies (n = 2255) were split into training sets (80%) and testing sets (20%). The training data was used as a source to train all four model types in 84 experiments (Fracture-data) and 45 experiments (Chest-data) with variation in size and prevalence. The performance was evaluated on sensitivity, specificity, positive predictive value, negative predictive value, area under the curve, and F score. After the NLP of radiology reports, all four model-architectures demonstrated high performance with metrics up to > 0.90. CNN, LSTM, and Dense were outperformed by the BERT algorithm because of its stable results despite variation in training size and prevalence. Awareness of variation in prevalence is warranted because it impacts sensitivity and specificity in opposite directions. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s10916-021-01761-4.
format	Online Article Text
id	pubmed-8416876
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Springer US
record_format	MEDLINE/PubMed
spelling	pubmed-84168762021-09-22 Deep Learning-Based Natural Language Processing in Radiology: The Impact of Report Complexity, Disease Prevalence, Dataset Size, and Algorithm Type on Model Performance Olthof, A. W. van Ooijen, P. M. A. Cornelissen, L. J. J Med Syst Systems-Level Quality Improvement In radiology, natural language processing (NLP) allows the extraction of valuable information from radiology reports. It can be used for various downstream tasks such as quality improvement, epidemiological research, and monitoring guideline adherence. Class imbalance, variation in dataset size, variation in report complexity, and algorithm type all influence NLP performance but have not yet been systematically and interrelatedly evaluated. In this study, we investigate these factors on the performance of four types [a fully connected neural network (Dense), a long short-term memory recurrent neural network (LSTM), a convolutional neural network (CNN), and a Bidirectional Encoder Representations from Transformers (BERT)] of deep learning-based NLP. Two datasets consisting of radiologist-annotated reports of both trauma radiographs (n = 2469) and chest radiographs and computer tomography (CT) studies (n = 2255) were split into training sets (80%) and testing sets (20%). The training data was used as a source to train all four model types in 84 experiments (Fracture-data) and 45 experiments (Chest-data) with variation in size and prevalence. The performance was evaluated on sensitivity, specificity, positive predictive value, negative predictive value, area under the curve, and F score. After the NLP of radiology reports, all four model-architectures demonstrated high performance with metrics up to > 0.90. CNN, LSTM, and Dense were outperformed by the BERT algorithm because of its stable results despite variation in training size and prevalence. Awareness of variation in prevalence is warranted because it impacts sensitivity and specificity in opposite directions. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s10916-021-01761-4. Springer US 2021-09-04 2021 /pmc/articles/PMC8416876/ /pubmed/34480231 http://dx.doi.org/10.1007/s10916-021-01761-4 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Systems-Level Quality Improvement Olthof, A. W. van Ooijen, P. M. A. Cornelissen, L. J. Deep Learning-Based Natural Language Processing in Radiology: The Impact of Report Complexity, Disease Prevalence, Dataset Size, and Algorithm Type on Model Performance
title	Deep Learning-Based Natural Language Processing in Radiology: The Impact of Report Complexity, Disease Prevalence, Dataset Size, and Algorithm Type on Model Performance
title_full	Deep Learning-Based Natural Language Processing in Radiology: The Impact of Report Complexity, Disease Prevalence, Dataset Size, and Algorithm Type on Model Performance
title_fullStr	Deep Learning-Based Natural Language Processing in Radiology: The Impact of Report Complexity, Disease Prevalence, Dataset Size, and Algorithm Type on Model Performance
title_full_unstemmed	Deep Learning-Based Natural Language Processing in Radiology: The Impact of Report Complexity, Disease Prevalence, Dataset Size, and Algorithm Type on Model Performance
title_short	Deep Learning-Based Natural Language Processing in Radiology: The Impact of Report Complexity, Disease Prevalence, Dataset Size, and Algorithm Type on Model Performance
title_sort	deep learning-based natural language processing in radiology: the impact of report complexity, disease prevalence, dataset size, and algorithm type on model performance
topic	Systems-Level Quality Improvement
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8416876/ https://www.ncbi.nlm.nih.gov/pubmed/34480231 http://dx.doi.org/10.1007/s10916-021-01761-4
work_keys_str_mv	AT olthofaw deeplearningbasednaturallanguageprocessinginradiologytheimpactofreportcomplexitydiseaseprevalencedatasetsizeandalgorithmtypeonmodelperformance AT vanooijenpma deeplearningbasednaturallanguageprocessinginradiologytheimpactofreportcomplexitydiseaseprevalencedatasetsizeandalgorithmtypeonmodelperformance AT cornelissenlj deeplearningbasednaturallanguageprocessinginradiologytheimpactofreportcomplexitydiseaseprevalencedatasetsizeandalgorithmtypeonmodelperformance

Deep Learning-Based Natural Language Processing in Radiology: The Impact of Report Complexity, Disease Prevalence, Dataset Size, and Algorithm Type on Model Performance

Ejemplares similares