Cargando…
Developing a RadLex-Based Named Entity Recognition Tool for Mining Textual Radiology Reports: Development and Performance Evaluation Study
BACKGROUND: Named entity recognition (NER) plays an important role in extracting the features of descriptions such as the name and location of a disease for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities that can be extrac...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8590187/ https://www.ncbi.nlm.nih.gov/pubmed/34714247 http://dx.doi.org/10.2196/25378 |
_version_ | 1784598902799859712 |
---|---|
author | Tsuji, Shintaro Wen, Andrew Takahashi, Naoki Zhang, Hongjian Ogasawara, Katsuhiko Jiang, Gouqian |
author_facet | Tsuji, Shintaro Wen, Andrew Takahashi, Naoki Zhang, Hongjian Ogasawara, Katsuhiko Jiang, Gouqian |
author_sort | Tsuji, Shintaro |
collection | PubMed |
description | BACKGROUND: Named entity recognition (NER) plays an important role in extracting the features of descriptions such as the name and location of a disease for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities that can be extracted depends on the dictionary lookup. In particular, the recognition of compound terms is very complicated because of the variety of patterns. OBJECTIVE: The aim of this study is to develop and evaluate an NER tool concerned with compound terms using RadLex for mining free-text radiology reports. METHODS: We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general purpose dictionary). We manually annotated 400 radiology reports for compound terms in noun phrases and used them as the gold standard for performance evaluation (precision, recall, and F-measure). In addition, we created a compound terms–enhanced dictionary (CtED) by analyzing false negatives and false positives and applied it to another 100 radiology reports for validation. We also evaluated the stem terms of compound terms by defining two measures: occurrence ratio (OR) and matching ratio (MR). RESULTS: The F-measure of cTAKES+RadLex+general purpose dictionary was 30.9% (precision 73.3% and recall 19.6%) and that of the combined CtED was 63.1% (precision 82.8% and recall 51%). The OR indicated that the stem terms of effusion, node, tube, and disease were used frequently, but it still lacks capturing compound terms. The MR showed that 71.85% (9411/13,098) of the stem terms matched with that of the ontologies, and RadLex improved approximately 22% of the MR from the cTAKES default dictionary. The OR and MR revealed that the characteristics of stem terms would have the potential to help generate synonymous phrases using the ontologies. CONCLUSIONS: We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance with regard to expanding vocabularies. |
format | Online Article Text |
id | pubmed-8590187 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-85901872021-12-07 Developing a RadLex-Based Named Entity Recognition Tool for Mining Textual Radiology Reports: Development and Performance Evaluation Study Tsuji, Shintaro Wen, Andrew Takahashi, Naoki Zhang, Hongjian Ogasawara, Katsuhiko Jiang, Gouqian J Med Internet Res Original Paper BACKGROUND: Named entity recognition (NER) plays an important role in extracting the features of descriptions such as the name and location of a disease for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities that can be extracted depends on the dictionary lookup. In particular, the recognition of compound terms is very complicated because of the variety of patterns. OBJECTIVE: The aim of this study is to develop and evaluate an NER tool concerned with compound terms using RadLex for mining free-text radiology reports. METHODS: We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general purpose dictionary). We manually annotated 400 radiology reports for compound terms in noun phrases and used them as the gold standard for performance evaluation (precision, recall, and F-measure). In addition, we created a compound terms–enhanced dictionary (CtED) by analyzing false negatives and false positives and applied it to another 100 radiology reports for validation. We also evaluated the stem terms of compound terms by defining two measures: occurrence ratio (OR) and matching ratio (MR). RESULTS: The F-measure of cTAKES+RadLex+general purpose dictionary was 30.9% (precision 73.3% and recall 19.6%) and that of the combined CtED was 63.1% (precision 82.8% and recall 51%). The OR indicated that the stem terms of effusion, node, tube, and disease were used frequently, but it still lacks capturing compound terms. The MR showed that 71.85% (9411/13,098) of the stem terms matched with that of the ontologies, and RadLex improved approximately 22% of the MR from the cTAKES default dictionary. The OR and MR revealed that the characteristics of stem terms would have the potential to help generate synonymous phrases using the ontologies. CONCLUSIONS: We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance with regard to expanding vocabularies. JMIR Publications 2021-10-29 /pmc/articles/PMC8590187/ /pubmed/34714247 http://dx.doi.org/10.2196/25378 Text en ©Shintaro Tsuji, Andrew Wen, Naoki Takahashi, Hongjian Zhang, Katsuhiko Ogasawara, Gouqian Jiang. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 29.10.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Tsuji, Shintaro Wen, Andrew Takahashi, Naoki Zhang, Hongjian Ogasawara, Katsuhiko Jiang, Gouqian Developing a RadLex-Based Named Entity Recognition Tool for Mining Textual Radiology Reports: Development and Performance Evaluation Study |
title | Developing a RadLex-Based Named Entity Recognition Tool for Mining Textual Radiology Reports: Development and Performance Evaluation Study |
title_full | Developing a RadLex-Based Named Entity Recognition Tool for Mining Textual Radiology Reports: Development and Performance Evaluation Study |
title_fullStr | Developing a RadLex-Based Named Entity Recognition Tool for Mining Textual Radiology Reports: Development and Performance Evaluation Study |
title_full_unstemmed | Developing a RadLex-Based Named Entity Recognition Tool for Mining Textual Radiology Reports: Development and Performance Evaluation Study |
title_short | Developing a RadLex-Based Named Entity Recognition Tool for Mining Textual Radiology Reports: Development and Performance Evaluation Study |
title_sort | developing a radlex-based named entity recognition tool for mining textual radiology reports: development and performance evaluation study |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8590187/ https://www.ncbi.nlm.nih.gov/pubmed/34714247 http://dx.doi.org/10.2196/25378 |
work_keys_str_mv | AT tsujishintaro developingaradlexbasednamedentityrecognitiontoolforminingtextualradiologyreportsdevelopmentandperformanceevaluationstudy AT wenandrew developingaradlexbasednamedentityrecognitiontoolforminingtextualradiologyreportsdevelopmentandperformanceevaluationstudy AT takahashinaoki developingaradlexbasednamedentityrecognitiontoolforminingtextualradiologyreportsdevelopmentandperformanceevaluationstudy AT zhanghongjian developingaradlexbasednamedentityrecognitiontoolforminingtextualradiologyreportsdevelopmentandperformanceevaluationstudy AT ogasawarakatsuhiko developingaradlexbasednamedentityrecognitiontoolforminingtextualradiologyreportsdevelopmentandperformanceevaluationstudy AT jianggouqian developingaradlexbasednamedentityrecognitiontoolforminingtextualradiologyreportsdevelopmentandperformanceevaluationstudy |