Cargando…

Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer

BACKGROUND: Medical imaging is critical in clinical practice, and high value radiological reports can positively assist clinicians. However, there is a lack of methods for determining the value of reports. OBJECTIVE: The purpose of this study was to establish an ensemble learning classification mode...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Wenjuan, Zhang, Xi, Lv, Han, Li, Jia, Liu, Yawen, Yang, Zhenghan, Weng, Xutao, Lin, Yucong, Song, Hong, Wang, Zhenchang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9720132/
https://www.ncbi.nlm.nih.gov/pubmed/36479085
http://dx.doi.org/10.3389/fonc.2022.913806
_version_ 1784843484409102336
author Liu, Wenjuan
Zhang, Xi
Lv, Han
Li, Jia
Liu, Yawen
Yang, Zhenghan
Weng, Xutao
Lin, Yucong
Song, Hong
Wang, Zhenchang
author_facet Liu, Wenjuan
Zhang, Xi
Lv, Han
Li, Jia
Liu, Yawen
Yang, Zhenghan
Weng, Xutao
Lin, Yucong
Song, Hong
Wang, Zhenchang
author_sort Liu, Wenjuan
collection PubMed
description BACKGROUND: Medical imaging is critical in clinical practice, and high value radiological reports can positively assist clinicians. However, there is a lack of methods for determining the value of reports. OBJECTIVE: The purpose of this study was to establish an ensemble learning classification model using natural language processing (NLP) applied to the Chinese free text of radiological reports to determine their value for liver lesion detection in patients with colorectal cancer (CRC). METHODS: Radiological reports of upper abdominal computed tomography (CT) and magnetic resonance imaging (MRI) were divided into five categories according to the results of liver lesion detection in patients with CRC. The NLP methods including word segmentation, stop word removal, and n-gram language model establishment were applied for each dataset. Then, a word-bag model was built, high-frequency words were selected as features, and an ensemble learning classification model was constructed. Several machine learning methods were applied, including logistic regression (LR), random forest (RF), and so on. We compared the accuracy between priori choosing pertinent word strings and our machine language methodologies. RESULTS: The dataset of 2790 patients included CT without contrast (10.2%), CT with/without contrast (73.3%), MRI without contrast (1.8%), and MRI with/without contrast (14.6%). The ensemble learning classification model determined the value of reports effectively, reaching 95.91% in the CT with/without contrast dataset using XGBoost. The logistic regression, random forest, and support vector machine also achieved good classification accuracy, reaching 95.89%, 95.04%, and 95.00% respectively. The results of XGBoost were visualized using a confusion matrix. The numbers of errors in categories I, II and V were very small. ELI5 was used to select important words for each category. Words such as “no abnormality”, “suggest”, “fatty liver”, and “transfer” showed a relatively large degree of positive correlation with classification accuracy. The accuracy based on string pattern search method model was lower than that of machine learning. CONCLUSIONS: The learning classification model based on NLP was an effective tool for determining the value of radiological reports focused on liver lesions. The study made it possible to analyze the value of medical imaging examinations on a large scale.
format Online
Article
Text
id pubmed-9720132
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-97201322022-12-06 Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer Liu, Wenjuan Zhang, Xi Lv, Han Li, Jia Liu, Yawen Yang, Zhenghan Weng, Xutao Lin, Yucong Song, Hong Wang, Zhenchang Front Oncol Oncology BACKGROUND: Medical imaging is critical in clinical practice, and high value radiological reports can positively assist clinicians. However, there is a lack of methods for determining the value of reports. OBJECTIVE: The purpose of this study was to establish an ensemble learning classification model using natural language processing (NLP) applied to the Chinese free text of radiological reports to determine their value for liver lesion detection in patients with colorectal cancer (CRC). METHODS: Radiological reports of upper abdominal computed tomography (CT) and magnetic resonance imaging (MRI) were divided into five categories according to the results of liver lesion detection in patients with CRC. The NLP methods including word segmentation, stop word removal, and n-gram language model establishment were applied for each dataset. Then, a word-bag model was built, high-frequency words were selected as features, and an ensemble learning classification model was constructed. Several machine learning methods were applied, including logistic regression (LR), random forest (RF), and so on. We compared the accuracy between priori choosing pertinent word strings and our machine language methodologies. RESULTS: The dataset of 2790 patients included CT without contrast (10.2%), CT with/without contrast (73.3%), MRI without contrast (1.8%), and MRI with/without contrast (14.6%). The ensemble learning classification model determined the value of reports effectively, reaching 95.91% in the CT with/without contrast dataset using XGBoost. The logistic regression, random forest, and support vector machine also achieved good classification accuracy, reaching 95.89%, 95.04%, and 95.00% respectively. The results of XGBoost were visualized using a confusion matrix. The numbers of errors in categories I, II and V were very small. ELI5 was used to select important words for each category. Words such as “no abnormality”, “suggest”, “fatty liver”, and “transfer” showed a relatively large degree of positive correlation with classification accuracy. The accuracy based on string pattern search method model was lower than that of machine learning. CONCLUSIONS: The learning classification model based on NLP was an effective tool for determining the value of radiological reports focused on liver lesions. The study made it possible to analyze the value of medical imaging examinations on a large scale. Frontiers Media S.A. 2022-11-21 /pmc/articles/PMC9720132/ /pubmed/36479085 http://dx.doi.org/10.3389/fonc.2022.913806 Text en Copyright © 2022 Liu, Zhang, Lv, Li, Liu, Yang, Weng, Lin, Song and Wang https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Oncology
Liu, Wenjuan
Zhang, Xi
Lv, Han
Li, Jia
Liu, Yawen
Yang, Zhenghan
Weng, Xutao
Lin, Yucong
Song, Hong
Wang, Zhenchang
Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer
title Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer
title_full Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer
title_fullStr Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer
title_full_unstemmed Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer
title_short Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer
title_sort using a classification model for determining the value of liver radiological reports of patients with colorectal cancer
topic Oncology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9720132/
https://www.ncbi.nlm.nih.gov/pubmed/36479085
http://dx.doi.org/10.3389/fonc.2022.913806
work_keys_str_mv AT liuwenjuan usingaclassificationmodelfordeterminingthevalueofliverradiologicalreportsofpatientswithcolorectalcancer
AT zhangxi usingaclassificationmodelfordeterminingthevalueofliverradiologicalreportsofpatientswithcolorectalcancer
AT lvhan usingaclassificationmodelfordeterminingthevalueofliverradiologicalreportsofpatientswithcolorectalcancer
AT lijia usingaclassificationmodelfordeterminingthevalueofliverradiologicalreportsofpatientswithcolorectalcancer
AT liuyawen usingaclassificationmodelfordeterminingthevalueofliverradiologicalreportsofpatientswithcolorectalcancer
AT yangzhenghan usingaclassificationmodelfordeterminingthevalueofliverradiologicalreportsofpatientswithcolorectalcancer
AT wengxutao usingaclassificationmodelfordeterminingthevalueofliverradiologicalreportsofpatientswithcolorectalcancer
AT linyucong usingaclassificationmodelfordeterminingthevalueofliverradiologicalreportsofpatientswithcolorectalcancer
AT songhong usingaclassificationmodelfordeterminingthevalueofliverradiologicalreportsofpatientswithcolorectalcancer
AT wangzhenchang usingaclassificationmodelfordeterminingthevalueofliverradiologicalreportsofpatientswithcolorectalcancer