Cargando…

Developing a RadLex-Based Named Entity Recognition Tool for Mining Textual Radiology Reports: Development and Performance Evaluation Study

BACKGROUND: Named entity recognition (NER) plays an important role in extracting the features of descriptions such as the name and location of a disease for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities that can be extrac...

Descripción completa

Detalles Bibliográficos
Autores principales: Tsuji, Shintaro, Wen, Andrew, Takahashi, Naoki, Zhang, Hongjian, Ogasawara, Katsuhiko, Jiang, Gouqian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8590187/
https://www.ncbi.nlm.nih.gov/pubmed/34714247
http://dx.doi.org/10.2196/25378
_version_ 1784598902799859712
author Tsuji, Shintaro
Wen, Andrew
Takahashi, Naoki
Zhang, Hongjian
Ogasawara, Katsuhiko
Jiang, Gouqian
author_facet Tsuji, Shintaro
Wen, Andrew
Takahashi, Naoki
Zhang, Hongjian
Ogasawara, Katsuhiko
Jiang, Gouqian
author_sort Tsuji, Shintaro
collection PubMed
description BACKGROUND: Named entity recognition (NER) plays an important role in extracting the features of descriptions such as the name and location of a disease for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities that can be extracted depends on the dictionary lookup. In particular, the recognition of compound terms is very complicated because of the variety of patterns. OBJECTIVE: The aim of this study is to develop and evaluate an NER tool concerned with compound terms using RadLex for mining free-text radiology reports. METHODS: We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general purpose dictionary). We manually annotated 400 radiology reports for compound terms in noun phrases and used them as the gold standard for performance evaluation (precision, recall, and F-measure). In addition, we created a compound terms–enhanced dictionary (CtED) by analyzing false negatives and false positives and applied it to another 100 radiology reports for validation. We also evaluated the stem terms of compound terms by defining two measures: occurrence ratio (OR) and matching ratio (MR). RESULTS: The F-measure of cTAKES+RadLex+general purpose dictionary was 30.9% (precision 73.3% and recall 19.6%) and that of the combined CtED was 63.1% (precision 82.8% and recall 51%). The OR indicated that the stem terms of effusion, node, tube, and disease were used frequently, but it still lacks capturing compound terms. The MR showed that 71.85% (9411/13,098) of the stem terms matched with that of the ontologies, and RadLex improved approximately 22% of the MR from the cTAKES default dictionary. The OR and MR revealed that the characteristics of stem terms would have the potential to help generate synonymous phrases using the ontologies. CONCLUSIONS: We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance with regard to expanding vocabularies.
format Online
Article
Text
id pubmed-8590187
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-85901872021-12-07 Developing a RadLex-Based Named Entity Recognition Tool for Mining Textual Radiology Reports: Development and Performance Evaluation Study Tsuji, Shintaro Wen, Andrew Takahashi, Naoki Zhang, Hongjian Ogasawara, Katsuhiko Jiang, Gouqian J Med Internet Res Original Paper BACKGROUND: Named entity recognition (NER) plays an important role in extracting the features of descriptions such as the name and location of a disease for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities that can be extracted depends on the dictionary lookup. In particular, the recognition of compound terms is very complicated because of the variety of patterns. OBJECTIVE: The aim of this study is to develop and evaluate an NER tool concerned with compound terms using RadLex for mining free-text radiology reports. METHODS: We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general purpose dictionary). We manually annotated 400 radiology reports for compound terms in noun phrases and used them as the gold standard for performance evaluation (precision, recall, and F-measure). In addition, we created a compound terms–enhanced dictionary (CtED) by analyzing false negatives and false positives and applied it to another 100 radiology reports for validation. We also evaluated the stem terms of compound terms by defining two measures: occurrence ratio (OR) and matching ratio (MR). RESULTS: The F-measure of cTAKES+RadLex+general purpose dictionary was 30.9% (precision 73.3% and recall 19.6%) and that of the combined CtED was 63.1% (precision 82.8% and recall 51%). The OR indicated that the stem terms of effusion, node, tube, and disease were used frequently, but it still lacks capturing compound terms. The MR showed that 71.85% (9411/13,098) of the stem terms matched with that of the ontologies, and RadLex improved approximately 22% of the MR from the cTAKES default dictionary. The OR and MR revealed that the characteristics of stem terms would have the potential to help generate synonymous phrases using the ontologies. CONCLUSIONS: We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance with regard to expanding vocabularies. JMIR Publications 2021-10-29 /pmc/articles/PMC8590187/ /pubmed/34714247 http://dx.doi.org/10.2196/25378 Text en ©Shintaro Tsuji, Andrew Wen, Naoki Takahashi, Hongjian Zhang, Katsuhiko Ogasawara, Gouqian Jiang. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 29.10.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Tsuji, Shintaro
Wen, Andrew
Takahashi, Naoki
Zhang, Hongjian
Ogasawara, Katsuhiko
Jiang, Gouqian
Developing a RadLex-Based Named Entity Recognition Tool for Mining Textual Radiology Reports: Development and Performance Evaluation Study
title Developing a RadLex-Based Named Entity Recognition Tool for Mining Textual Radiology Reports: Development and Performance Evaluation Study
title_full Developing a RadLex-Based Named Entity Recognition Tool for Mining Textual Radiology Reports: Development and Performance Evaluation Study
title_fullStr Developing a RadLex-Based Named Entity Recognition Tool for Mining Textual Radiology Reports: Development and Performance Evaluation Study
title_full_unstemmed Developing a RadLex-Based Named Entity Recognition Tool for Mining Textual Radiology Reports: Development and Performance Evaluation Study
title_short Developing a RadLex-Based Named Entity Recognition Tool for Mining Textual Radiology Reports: Development and Performance Evaluation Study
title_sort developing a radlex-based named entity recognition tool for mining textual radiology reports: development and performance evaluation study
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8590187/
https://www.ncbi.nlm.nih.gov/pubmed/34714247
http://dx.doi.org/10.2196/25378
work_keys_str_mv AT tsujishintaro developingaradlexbasednamedentityrecognitiontoolforminingtextualradiologyreportsdevelopmentandperformanceevaluationstudy
AT wenandrew developingaradlexbasednamedentityrecognitiontoolforminingtextualradiologyreportsdevelopmentandperformanceevaluationstudy
AT takahashinaoki developingaradlexbasednamedentityrecognitiontoolforminingtextualradiologyreportsdevelopmentandperformanceevaluationstudy
AT zhanghongjian developingaradlexbasednamedentityrecognitiontoolforminingtextualradiologyreportsdevelopmentandperformanceevaluationstudy
AT ogasawarakatsuhiko developingaradlexbasednamedentityrecognitiontoolforminingtextualradiologyreportsdevelopmentandperformanceevaluationstudy
AT jianggouqian developingaradlexbasednamedentityrecognitiontoolforminingtextualradiologyreportsdevelopmentandperformanceevaluationstudy