Cargando…

Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT)

BACKGROUND: Given the increasing number of people suffering from tinnitus, the accurate categorization of patients with actionable reports is attractive in assisting clinical decision making. However, this process requires experienced physicians and significant human labor. Natural language processi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Jia, Lin, Yucong, Zhao, Pengfei, Liu, Wenjuan, Cai, Linkun, Sun, Jing, Zhao, Lei, Yang, Zhenghan, Song, Hong, Lv, Han, Wang, Zhenchang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2022
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9338483/ https://www.ncbi.nlm.nih.gov/pubmed/35907966 http://dx.doi.org/10.1186/s12911-022-01946-y

_version_	1784759977213165568
author	Li, Jia Lin, Yucong Zhao, Pengfei Liu, Wenjuan Cai, Linkun Sun, Jing Zhao, Lei Yang, Zhenghan Song, Hong Lv, Han Wang, Zhenchang
author_facet	Li, Jia Lin, Yucong Zhao, Pengfei Liu, Wenjuan Cai, Linkun Sun, Jing Zhao, Lei Yang, Zhenghan Song, Hong Lv, Han Wang, Zhenchang
author_sort	Li, Jia
collection	PubMed
description	BACKGROUND: Given the increasing number of people suffering from tinnitus, the accurate categorization of patients with actionable reports is attractive in assisting clinical decision making. However, this process requires experienced physicians and significant human labor. Natural language processing (NLP) has shown great potential in big data analytics of medical texts; yet, its application to domain-specific analysis of radiology reports is limited. OBJECTIVE: The aim of this study is to propose a novel approach in classifying actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer BERT-based models and evaluate the benefits of in domain pre-training (IDPT) along with a sequence adaptation strategy. METHODS: A total of 5864 temporal bone computed tomography(CT) reports are labeled by two experienced radiologists as follows: (1) normal findings without notable lesions; (2) notable lesions but uncorrelated to tinnitus; and (3) at least one lesion considered as potential cause of tinnitus. We then constructed a framework consisting of deep learning (DL) neural networks and self-supervised BERT models. A tinnitus domain-specific corpus is used to pre-train the BERT model to further improve its embedding weights. In addition, we conducted an experiment to evaluate multiple groups of max sequence length settings in BERT to reduce the excessive quantity of calculations. After a comprehensive comparison of all metrics, we determined the most promising approach through the performance comparison of F1-scores and AUC values. RESULTS: In the first experiment, the BERT finetune model achieved a more promising result (AUC-0.868, F1-0.760) compared with that of the Word2Vec-based models(AUC-0.767, F1-0.733) on validation data. In the second experiment, the BERT in-domain pre-training model (AUC-0.948, F1-0.841) performed significantly better than the BERT based model(AUC-0.868, F1-0.760). Additionally, in the variants of BERT fine-tuning models, Mengzi achieved the highest AUC of 0.878 (F1-0.764). Finally, we found that the BERT max-sequence-length of 128 tokens achieved an AUC of 0.866 (F1-0.736), which is almost equal to the BERT max-sequence-length of 512 tokens (AUC-0.868,F1-0.760). CONCLUSION: In conclusion, we developed a reliable BERT-based framework for tinnitus diagnosis from Chinese radiology reports, along with a sequence adaptation strategy to reduce computational resources while maintaining accuracy. The findings could provide a reference for NLP development in Chinese radiology reports. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-022-01946-y.
format	Online Article Text
id	pubmed-9338483
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-93384832022-07-31 Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT) Li, Jia Lin, Yucong Zhao, Pengfei Liu, Wenjuan Cai, Linkun Sun, Jing Zhao, Lei Yang, Zhenghan Song, Hong Lv, Han Wang, Zhenchang BMC Med Inform Decis Mak Research BACKGROUND: Given the increasing number of people suffering from tinnitus, the accurate categorization of patients with actionable reports is attractive in assisting clinical decision making. However, this process requires experienced physicians and significant human labor. Natural language processing (NLP) has shown great potential in big data analytics of medical texts; yet, its application to domain-specific analysis of radiology reports is limited. OBJECTIVE: The aim of this study is to propose a novel approach in classifying actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer BERT-based models and evaluate the benefits of in domain pre-training (IDPT) along with a sequence adaptation strategy. METHODS: A total of 5864 temporal bone computed tomography(CT) reports are labeled by two experienced radiologists as follows: (1) normal findings without notable lesions; (2) notable lesions but uncorrelated to tinnitus; and (3) at least one lesion considered as potential cause of tinnitus. We then constructed a framework consisting of deep learning (DL) neural networks and self-supervised BERT models. A tinnitus domain-specific corpus is used to pre-train the BERT model to further improve its embedding weights. In addition, we conducted an experiment to evaluate multiple groups of max sequence length settings in BERT to reduce the excessive quantity of calculations. After a comprehensive comparison of all metrics, we determined the most promising approach through the performance comparison of F1-scores and AUC values. RESULTS: In the first experiment, the BERT finetune model achieved a more promising result (AUC-0.868, F1-0.760) compared with that of the Word2Vec-based models(AUC-0.767, F1-0.733) on validation data. In the second experiment, the BERT in-domain pre-training model (AUC-0.948, F1-0.841) performed significantly better than the BERT based model(AUC-0.868, F1-0.760). Additionally, in the variants of BERT fine-tuning models, Mengzi achieved the highest AUC of 0.878 (F1-0.764). Finally, we found that the BERT max-sequence-length of 128 tokens achieved an AUC of 0.866 (F1-0.736), which is almost equal to the BERT max-sequence-length of 512 tokens (AUC-0.868,F1-0.760). CONCLUSION: In conclusion, we developed a reliable BERT-based framework for tinnitus diagnosis from Chinese radiology reports, along with a sequence adaptation strategy to reduce computational resources while maintaining accuracy. The findings could provide a reference for NLP development in Chinese radiology reports. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-022-01946-y. BioMed Central 2022-07-30 /pmc/articles/PMC9338483/ /pubmed/35907966 http://dx.doi.org/10.1186/s12911-022-01946-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Li, Jia Lin, Yucong Zhao, Pengfei Liu, Wenjuan Cai, Linkun Sun, Jing Zhao, Lei Yang, Zhenghan Song, Hong Lv, Han Wang, Zhenchang Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT)
title	Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT)
title_full	Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT)
title_fullStr	Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT)
title_full_unstemmed	Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT)
title_short	Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT)
title_sort	automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (bert) and in-domain pre-training (idpt)
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9338483/ https://www.ncbi.nlm.nih.gov/pubmed/35907966 http://dx.doi.org/10.1186/s12911-022-01946-y
work_keys_str_mv	AT lijia automatictextclassificationofactionableradiologyreportsoftinnituspatientsusingbidirectionalencoderrepresentationsfromtransformerbertandindomainpretrainingidpt AT linyucong automatictextclassificationofactionableradiologyreportsoftinnituspatientsusingbidirectionalencoderrepresentationsfromtransformerbertandindomainpretrainingidpt AT zhaopengfei automatictextclassificationofactionableradiologyreportsoftinnituspatientsusingbidirectionalencoderrepresentationsfromtransformerbertandindomainpretrainingidpt AT liuwenjuan automatictextclassificationofactionableradiologyreportsoftinnituspatientsusingbidirectionalencoderrepresentationsfromtransformerbertandindomainpretrainingidpt AT cailinkun automatictextclassificationofactionableradiologyreportsoftinnituspatientsusingbidirectionalencoderrepresentationsfromtransformerbertandindomainpretrainingidpt AT sunjing automatictextclassificationofactionableradiologyreportsoftinnituspatientsusingbidirectionalencoderrepresentationsfromtransformerbertandindomainpretrainingidpt AT zhaolei automatictextclassificationofactionableradiologyreportsoftinnituspatientsusingbidirectionalencoderrepresentationsfromtransformerbertandindomainpretrainingidpt AT yangzhenghan automatictextclassificationofactionableradiologyreportsoftinnituspatientsusingbidirectionalencoderrepresentationsfromtransformerbertandindomainpretrainingidpt AT songhong automatictextclassificationofactionableradiologyreportsoftinnituspatientsusingbidirectionalencoderrepresentationsfromtransformerbertandindomainpretrainingidpt AT lvhan automatictextclassificationofactionableradiologyreportsoftinnituspatientsusingbidirectionalencoderrepresentationsfromtransformerbertandindomainpretrainingidpt AT wangzhenchang automatictextclassificationofactionableradiologyreportsoftinnituspatientsusingbidirectionalencoderrepresentationsfromtransformerbertandindomainpretrainingidpt

Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT)

Ejemplares similares