Cargando…
Using deep learning to assist readers during the arbitration process: a lesion-based retrospective evaluation of breast cancer screening performance
OBJECTIVES: To evaluate if artificial intelligence (AI) can discriminate recalled benign from recalled malignant mammographic screening abnormalities to improve screening performance. METHODS: A total of 2257 full-field digital mammography screening examinations, obtained 2011–2013, of women aged 50...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Berlin Heidelberg
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8794989/ https://www.ncbi.nlm.nih.gov/pubmed/34383147 http://dx.doi.org/10.1007/s00330-021-08217-w |
_version_ | 1784640948258471936 |
---|---|
author | Kerschke, Laura Weigel, Stefanie Rodriguez-Ruiz, Alejandro Karssemeijer, Nico Heindel, Walter |
author_facet | Kerschke, Laura Weigel, Stefanie Rodriguez-Ruiz, Alejandro Karssemeijer, Nico Heindel, Walter |
author_sort | Kerschke, Laura |
collection | PubMed |
description | OBJECTIVES: To evaluate if artificial intelligence (AI) can discriminate recalled benign from recalled malignant mammographic screening abnormalities to improve screening performance. METHODS: A total of 2257 full-field digital mammography screening examinations, obtained 2011–2013, of women aged 50–69 years which were recalled for further assessment of 295 malignant out of 305 truly malignant lesions and 2289 benign lesions after independent double-reading with arbitration, were included in this retrospective study. A deep learning AI system was used to obtain a score (0–95) for each recalled lesion, representing the likelihood of breast cancer. The sensitivity on the lesion level and the proportion of women without false-positive ratings (non-FPR) resulting under AI were estimated as a function of the classification cutoff and compared to that of human readers. RESULTS: Using a cutoff of 1, AI decreased the proportion of women with false-positives from 89.9 to 62.0%, non-FPR 11.1% vs. 38.0% (difference 26.9%, 95% confidence interval 25.1–28.8%; p < .001), preventing 30.1% of reader-induced false-positive recalls, while reducing sensitivity from 96.7 to 91.1% (5.6%, 3.1–8.0%) as compared to human reading. The positive predictive value of recall (PPV-1) increased from 12.8 to 16.5% (3.7%, 3.5–4.0%). In women with mass-related lesions (n = 900), the non-FPR was 14.2% for humans vs. 36.7% for AI (22.4%, 19.8–25.3%) at a sensitivity of 98.5% vs. 97.1% (1.5%, 0–3.5%). CONCLUSION: The application of AI during consensus conference might especially help readers to reduce false-positive recalls of masses at the expense of a small sensitivity reduction. Prospective studies are needed to further evaluate the screening benefit of AI in practice. KEY POINTS: • Integrating the use of artificial intelligence in the arbitration process reduces benign recalls and increases the positive predictive value of recall at the expense of some sensitivity loss. • Application of the artificial intelligence system to aid the decision to recall a woman seems particularly beneficial for masses, where the system reaches comparable sensitivity to that of the readers, but with considerably reduced false-positives. • About one-fourth of all recalled malignant lesions are not automatically marked by the system such that their evaluation (AI score) must be retrieved manually by the reader. A thorough reading of screening mammograms by readers to identify suspicious lesions therefore remains mandatory. |
format | Online Article Text |
id | pubmed-8794989 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Springer Berlin Heidelberg |
record_format | MEDLINE/PubMed |
spelling | pubmed-87949892022-02-02 Using deep learning to assist readers during the arbitration process: a lesion-based retrospective evaluation of breast cancer screening performance Kerschke, Laura Weigel, Stefanie Rodriguez-Ruiz, Alejandro Karssemeijer, Nico Heindel, Walter Eur Radiol Breast OBJECTIVES: To evaluate if artificial intelligence (AI) can discriminate recalled benign from recalled malignant mammographic screening abnormalities to improve screening performance. METHODS: A total of 2257 full-field digital mammography screening examinations, obtained 2011–2013, of women aged 50–69 years which were recalled for further assessment of 295 malignant out of 305 truly malignant lesions and 2289 benign lesions after independent double-reading with arbitration, were included in this retrospective study. A deep learning AI system was used to obtain a score (0–95) for each recalled lesion, representing the likelihood of breast cancer. The sensitivity on the lesion level and the proportion of women without false-positive ratings (non-FPR) resulting under AI were estimated as a function of the classification cutoff and compared to that of human readers. RESULTS: Using a cutoff of 1, AI decreased the proportion of women with false-positives from 89.9 to 62.0%, non-FPR 11.1% vs. 38.0% (difference 26.9%, 95% confidence interval 25.1–28.8%; p < .001), preventing 30.1% of reader-induced false-positive recalls, while reducing sensitivity from 96.7 to 91.1% (5.6%, 3.1–8.0%) as compared to human reading. The positive predictive value of recall (PPV-1) increased from 12.8 to 16.5% (3.7%, 3.5–4.0%). In women with mass-related lesions (n = 900), the non-FPR was 14.2% for humans vs. 36.7% for AI (22.4%, 19.8–25.3%) at a sensitivity of 98.5% vs. 97.1% (1.5%, 0–3.5%). CONCLUSION: The application of AI during consensus conference might especially help readers to reduce false-positive recalls of masses at the expense of a small sensitivity reduction. Prospective studies are needed to further evaluate the screening benefit of AI in practice. KEY POINTS: • Integrating the use of artificial intelligence in the arbitration process reduces benign recalls and increases the positive predictive value of recall at the expense of some sensitivity loss. • Application of the artificial intelligence system to aid the decision to recall a woman seems particularly beneficial for masses, where the system reaches comparable sensitivity to that of the readers, but with considerably reduced false-positives. • About one-fourth of all recalled malignant lesions are not automatically marked by the system such that their evaluation (AI score) must be retrieved manually by the reader. A thorough reading of screening mammograms by readers to identify suspicious lesions therefore remains mandatory. Springer Berlin Heidelberg 2021-08-12 2022 /pmc/articles/PMC8794989/ /pubmed/34383147 http://dx.doi.org/10.1007/s00330-021-08217-w Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Breast Kerschke, Laura Weigel, Stefanie Rodriguez-Ruiz, Alejandro Karssemeijer, Nico Heindel, Walter Using deep learning to assist readers during the arbitration process: a lesion-based retrospective evaluation of breast cancer screening performance |
title | Using deep learning to assist readers during the arbitration process: a lesion-based retrospective evaluation of breast cancer screening performance |
title_full | Using deep learning to assist readers during the arbitration process: a lesion-based retrospective evaluation of breast cancer screening performance |
title_fullStr | Using deep learning to assist readers during the arbitration process: a lesion-based retrospective evaluation of breast cancer screening performance |
title_full_unstemmed | Using deep learning to assist readers during the arbitration process: a lesion-based retrospective evaluation of breast cancer screening performance |
title_short | Using deep learning to assist readers during the arbitration process: a lesion-based retrospective evaluation of breast cancer screening performance |
title_sort | using deep learning to assist readers during the arbitration process: a lesion-based retrospective evaluation of breast cancer screening performance |
topic | Breast |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8794989/ https://www.ncbi.nlm.nih.gov/pubmed/34383147 http://dx.doi.org/10.1007/s00330-021-08217-w |
work_keys_str_mv | AT kerschkelaura usingdeeplearningtoassistreadersduringthearbitrationprocessalesionbasedretrospectiveevaluationofbreastcancerscreeningperformance AT weigelstefanie usingdeeplearningtoassistreadersduringthearbitrationprocessalesionbasedretrospectiveevaluationofbreastcancerscreeningperformance AT rodriguezruizalejandro usingdeeplearningtoassistreadersduringthearbitrationprocessalesionbasedretrospectiveevaluationofbreastcancerscreeningperformance AT karssemeijernico usingdeeplearningtoassistreadersduringthearbitrationprocessalesionbasedretrospectiveevaluationofbreastcancerscreeningperformance AT heindelwalter usingdeeplearningtoassistreadersduringthearbitrationprocessalesionbasedretrospectiveevaluationofbreastcancerscreeningperformance |