Cargando…
Comparison of Chest Radiograph Captions Based on Natural Language Processing vs Completed by Radiologists
IMPORTANCE: Artificial intelligence (AI) can interpret abnormal signs in chest radiography (CXR) and generate captions, but a prospective study is needed to examine its practical value. OBJECTIVE: To prospectively compare natural language processing (NLP)-generated CXR captions and the diagnostic fi...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Medical Association
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9909497/ https://www.ncbi.nlm.nih.gov/pubmed/36753278 http://dx.doi.org/10.1001/jamanetworkopen.2022.55113 |
_version_ | 1784884588346081280 |
---|---|
author | Zhang, Yaping Liu, Mingqian Zhang, Lu Wang, Lingyun Zhao, Keke Hu, Shundong Chen, Xu Xie, Xueqian |
author_facet | Zhang, Yaping Liu, Mingqian Zhang, Lu Wang, Lingyun Zhao, Keke Hu, Shundong Chen, Xu Xie, Xueqian |
author_sort | Zhang, Yaping |
collection | PubMed |
description | IMPORTANCE: Artificial intelligence (AI) can interpret abnormal signs in chest radiography (CXR) and generate captions, but a prospective study is needed to examine its practical value. OBJECTIVE: To prospectively compare natural language processing (NLP)-generated CXR captions and the diagnostic findings of radiologists. DESIGN, SETTING, AND PARTICIPANTS: A multicenter diagnostic study was conducted. The training data set included CXR images and reports retrospectively collected from February 1, 2014, to February 28, 2018. The retrospective test data set included consecutive images and reports from April 1 to July 31, 2019. The prospective test data set included consecutive images and reports from May 1 to September 30, 2021. EXPOSURES: A bidirectional encoder representation from a transformers model was used to extract language entities and relationships from unstructured CXR reports to establish 23 labels of abnormal signs to train convolutional neural networks. The participants in the prospective test group were randomly assigned to 1 of 3 different caption generation models: a normal template, NLP-generated captions, and rule-based captions based on convolutional neural networks. For each case, a resident drafted the report based on the randomly assigned captions and an experienced radiologist finalized the report blinded to the original captions. A total of 21 residents and 19 radiologists were involved. MAIN OUTCOMES AND MEASURES: Time to write reports based on different caption generation models. RESULTS: The training data set consisted of 74 082 cases (39 254 [53.0%] women; mean [SD] age, 50.0 [17.1] years). In the retrospective (n = 8126; 4345 [53.5%] women; mean [SD] age, 47.9 [15.9] years) and prospective (n = 5091; 2416 [47.5%] women; mean [SD] age, 45.1 [15.6] years) test data sets, the mean (SD) area under the curve of abnormal signs was 0.87 (0.11) in the retrospective data set and 0.84 (0.09) in the prospective data set. The residents’ mean (SD) reporting time using the NLP-generated model was 283 (37) seconds—significantly shorter than the normal template (347 [58] seconds; P < .001) and the rule-based model (296 [46] seconds; P < .001). The NLP-generated captions showed the highest similarity to the final reports with a mean (SD) bilingual evaluation understudy score of 0.69 (0.24)—significantly higher than the normal template (0.37 [0.09]; P < .001) and the rule-based model (0.57 [0.19]; P < .001). CONCLUSIONS AND RELEVANCE: In this diagnostic study of NLP-generated CXR captions, prior information provided by NLP was associated with greater efficiency in the reporting process, while maintaining good consistency with the findings of radiologists. |
format | Online Article Text |
id | pubmed-9909497 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Medical Association |
record_format | MEDLINE/PubMed |
spelling | pubmed-99094972023-02-10 Comparison of Chest Radiograph Captions Based on Natural Language Processing vs Completed by Radiologists Zhang, Yaping Liu, Mingqian Zhang, Lu Wang, Lingyun Zhao, Keke Hu, Shundong Chen, Xu Xie, Xueqian JAMA Netw Open Original Investigation IMPORTANCE: Artificial intelligence (AI) can interpret abnormal signs in chest radiography (CXR) and generate captions, but a prospective study is needed to examine its practical value. OBJECTIVE: To prospectively compare natural language processing (NLP)-generated CXR captions and the diagnostic findings of radiologists. DESIGN, SETTING, AND PARTICIPANTS: A multicenter diagnostic study was conducted. The training data set included CXR images and reports retrospectively collected from February 1, 2014, to February 28, 2018. The retrospective test data set included consecutive images and reports from April 1 to July 31, 2019. The prospective test data set included consecutive images and reports from May 1 to September 30, 2021. EXPOSURES: A bidirectional encoder representation from a transformers model was used to extract language entities and relationships from unstructured CXR reports to establish 23 labels of abnormal signs to train convolutional neural networks. The participants in the prospective test group were randomly assigned to 1 of 3 different caption generation models: a normal template, NLP-generated captions, and rule-based captions based on convolutional neural networks. For each case, a resident drafted the report based on the randomly assigned captions and an experienced radiologist finalized the report blinded to the original captions. A total of 21 residents and 19 radiologists were involved. MAIN OUTCOMES AND MEASURES: Time to write reports based on different caption generation models. RESULTS: The training data set consisted of 74 082 cases (39 254 [53.0%] women; mean [SD] age, 50.0 [17.1] years). In the retrospective (n = 8126; 4345 [53.5%] women; mean [SD] age, 47.9 [15.9] years) and prospective (n = 5091; 2416 [47.5%] women; mean [SD] age, 45.1 [15.6] years) test data sets, the mean (SD) area under the curve of abnormal signs was 0.87 (0.11) in the retrospective data set and 0.84 (0.09) in the prospective data set. The residents’ mean (SD) reporting time using the NLP-generated model was 283 (37) seconds—significantly shorter than the normal template (347 [58] seconds; P < .001) and the rule-based model (296 [46] seconds; P < .001). The NLP-generated captions showed the highest similarity to the final reports with a mean (SD) bilingual evaluation understudy score of 0.69 (0.24)—significantly higher than the normal template (0.37 [0.09]; P < .001) and the rule-based model (0.57 [0.19]; P < .001). CONCLUSIONS AND RELEVANCE: In this diagnostic study of NLP-generated CXR captions, prior information provided by NLP was associated with greater efficiency in the reporting process, while maintaining good consistency with the findings of radiologists. American Medical Association 2023-02-08 /pmc/articles/PMC9909497/ /pubmed/36753278 http://dx.doi.org/10.1001/jamanetworkopen.2022.55113 Text en Copyright 2023 Zhang Y et al. JAMA Network Open. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the CC-BY License. |
spellingShingle | Original Investigation Zhang, Yaping Liu, Mingqian Zhang, Lu Wang, Lingyun Zhao, Keke Hu, Shundong Chen, Xu Xie, Xueqian Comparison of Chest Radiograph Captions Based on Natural Language Processing vs Completed by Radiologists |
title | Comparison of Chest Radiograph Captions Based on Natural Language Processing vs Completed by Radiologists |
title_full | Comparison of Chest Radiograph Captions Based on Natural Language Processing vs Completed by Radiologists |
title_fullStr | Comparison of Chest Radiograph Captions Based on Natural Language Processing vs Completed by Radiologists |
title_full_unstemmed | Comparison of Chest Radiograph Captions Based on Natural Language Processing vs Completed by Radiologists |
title_short | Comparison of Chest Radiograph Captions Based on Natural Language Processing vs Completed by Radiologists |
title_sort | comparison of chest radiograph captions based on natural language processing vs completed by radiologists |
topic | Original Investigation |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9909497/ https://www.ncbi.nlm.nih.gov/pubmed/36753278 http://dx.doi.org/10.1001/jamanetworkopen.2022.55113 |
work_keys_str_mv | AT zhangyaping comparisonofchestradiographcaptionsbasedonnaturallanguageprocessingvscompletedbyradiologists AT liumingqian comparisonofchestradiographcaptionsbasedonnaturallanguageprocessingvscompletedbyradiologists AT zhanglu comparisonofchestradiographcaptionsbasedonnaturallanguageprocessingvscompletedbyradiologists AT wanglingyun comparisonofchestradiographcaptionsbasedonnaturallanguageprocessingvscompletedbyradiologists AT zhaokeke comparisonofchestradiographcaptionsbasedonnaturallanguageprocessingvscompletedbyradiologists AT hushundong comparisonofchestradiographcaptionsbasedonnaturallanguageprocessingvscompletedbyradiologists AT chenxu comparisonofchestradiographcaptionsbasedonnaturallanguageprocessingvscompletedbyradiologists AT xiexueqian comparisonofchestradiographcaptionsbasedonnaturallanguageprocessingvscompletedbyradiologists |