Cargando…

Combining the strengths of radiologists and AI for breast cancer screening: a retrospective analysis

BACKGROUND: We propose a decision-referral approach for integrating artificial intelligence (AI) into the breast-cancer screening pathway, whereby the algorithm makes predictions on the basis of its quantification of uncertainty. Algorithmic assessments with high certainty are done automatically, wh...

Descripción completa

Detalles Bibliográficos
Autores principales: Leibig, Christian, Brehmer, Moritz, Bunk, Stefan, Byng, Danalyn, Pinker, Katja, Umutlu, Lale
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9839981/
https://www.ncbi.nlm.nih.gov/pubmed/35750400
http://dx.doi.org/10.1016/S2589-7500(22)00070-X
_version_ 1784869553372659712
author Leibig, Christian
Brehmer, Moritz
Bunk, Stefan
Byng, Danalyn
Pinker, Katja
Umutlu, Lale
author_facet Leibig, Christian
Brehmer, Moritz
Bunk, Stefan
Byng, Danalyn
Pinker, Katja
Umutlu, Lale
author_sort Leibig, Christian
collection PubMed
description BACKGROUND: We propose a decision-referral approach for integrating artificial intelligence (AI) into the breast-cancer screening pathway, whereby the algorithm makes predictions on the basis of its quantification of uncertainty. Algorithmic assessments with high certainty are done automatically, whereas assessments with lower certainty are referred to the radiologist. This two-part AI system can triage normal mammography exams and provide post-hoc cancer detection to maintain a high degree of sensitivity. This study aimed to evaluate the performance of this AI system on sensitivity and specificity when used either as a standalone system or within a decision-referral approach, compared with the original radiologist decision. METHODS: We used a retrospective dataset consisting of 1 193 197 full-field, digital mammography studies carried out between Jan 1, 2007, and Dec 31, 2020, from eight screening sites participating in the German national breast-cancer screening programme. We derived an internal-test dataset from six screening sites (1670 screen-detected cancers and 19 997 normal mammography exams), and an external-test dataset of breast cancer screening exams (2793 screen-detected cancers and 80 058 normal exams) from two additional screening sites to evaluate the performance of an AI algorithm on sensitivity and specificity when used either as a standalone system or within a decision-referral approach, compared with the original individual radiologist decision at the point-of-screen reading ahead of the consensus conference. Different configurations of the AI algorithm were evaluated. To account for the enrichment of the datasets caused by oversampling cancer cases, weights were applied to reflect the actual distribution of study types in the screening programme. Triaging performance was evaluated as the rate of exams correctly identified as normal. Sensitivity across clinically relevant subgroups, screening sites, and device manufacturers was compared between standalone AI, the radiologist, and decision referral. We present receiver operating characteristic (ROC) curves and area under the ROC (AUROC) to evaluate AI-system performance over its entire operating range. Comparison with radiologists and subgroup analysis was based on sensitivity and specificity at clinically relevant configurations. FINDINGS: The exemplary configuration of the AI system in standalone mode achieved a sensitivity of 84·2% (95% CI 82·4–85·8) and a specificity of 89·5% (89·0–89·9) on internal-test data, and a sensitivity of 84·6% (83·3–85·9) and a specificity of 91·3% (91·1–91·5) on external-test data, but was less accurate than the average unaided radiologist. By contrast, the simulated decision-referral approach significantly improved upon radiologist sensitivity by 2·6 percentage points and specificity by 1·0 percentage points, corresponding to a triaging performance at 63·0% on the external dataset; the AUROC was 0·982 (95% CI 0·978–0·986) on the subset of studies assessed by AI, surpassing radiologist performance. The decision-referral approach also yielded significant increases in sensitivity for a number of clinically relevant subgroups, including subgroups of small lesion sizes and invasive carcinomas. Sensitivity of the decision-referral approach was consistent across the eight included screening sites and three device manufacturers. INTERPRETATION: The decision-referral approach leverages the strengths of both the radiologist and AI, demonstrating improvements in sensitivity and specificity surpassing that of the individual radiologist and of the standalone AI system. This approach has the potential to improve the screening accuracy of radiologists, is adaptive to the requirements of screening, and could allow for the reduction of workload ahead of the consensus conference, without discarding the generalised knowledge of radiologists. FUNDING: Vara.
format Online
Article
Text
id pubmed-9839981
institution National Center for Biotechnology Information
language English
publishDate 2022
record_format MEDLINE/PubMed
spelling pubmed-98399812023-01-14 Combining the strengths of radiologists and AI for breast cancer screening: a retrospective analysis Leibig, Christian Brehmer, Moritz Bunk, Stefan Byng, Danalyn Pinker, Katja Umutlu, Lale Lancet Digit Health Article BACKGROUND: We propose a decision-referral approach for integrating artificial intelligence (AI) into the breast-cancer screening pathway, whereby the algorithm makes predictions on the basis of its quantification of uncertainty. Algorithmic assessments with high certainty are done automatically, whereas assessments with lower certainty are referred to the radiologist. This two-part AI system can triage normal mammography exams and provide post-hoc cancer detection to maintain a high degree of sensitivity. This study aimed to evaluate the performance of this AI system on sensitivity and specificity when used either as a standalone system or within a decision-referral approach, compared with the original radiologist decision. METHODS: We used a retrospective dataset consisting of 1 193 197 full-field, digital mammography studies carried out between Jan 1, 2007, and Dec 31, 2020, from eight screening sites participating in the German national breast-cancer screening programme. We derived an internal-test dataset from six screening sites (1670 screen-detected cancers and 19 997 normal mammography exams), and an external-test dataset of breast cancer screening exams (2793 screen-detected cancers and 80 058 normal exams) from two additional screening sites to evaluate the performance of an AI algorithm on sensitivity and specificity when used either as a standalone system or within a decision-referral approach, compared with the original individual radiologist decision at the point-of-screen reading ahead of the consensus conference. Different configurations of the AI algorithm were evaluated. To account for the enrichment of the datasets caused by oversampling cancer cases, weights were applied to reflect the actual distribution of study types in the screening programme. Triaging performance was evaluated as the rate of exams correctly identified as normal. Sensitivity across clinically relevant subgroups, screening sites, and device manufacturers was compared between standalone AI, the radiologist, and decision referral. We present receiver operating characteristic (ROC) curves and area under the ROC (AUROC) to evaluate AI-system performance over its entire operating range. Comparison with radiologists and subgroup analysis was based on sensitivity and specificity at clinically relevant configurations. FINDINGS: The exemplary configuration of the AI system in standalone mode achieved a sensitivity of 84·2% (95% CI 82·4–85·8) and a specificity of 89·5% (89·0–89·9) on internal-test data, and a sensitivity of 84·6% (83·3–85·9) and a specificity of 91·3% (91·1–91·5) on external-test data, but was less accurate than the average unaided radiologist. By contrast, the simulated decision-referral approach significantly improved upon radiologist sensitivity by 2·6 percentage points and specificity by 1·0 percentage points, corresponding to a triaging performance at 63·0% on the external dataset; the AUROC was 0·982 (95% CI 0·978–0·986) on the subset of studies assessed by AI, surpassing radiologist performance. The decision-referral approach also yielded significant increases in sensitivity for a number of clinically relevant subgroups, including subgroups of small lesion sizes and invasive carcinomas. Sensitivity of the decision-referral approach was consistent across the eight included screening sites and three device manufacturers. INTERPRETATION: The decision-referral approach leverages the strengths of both the radiologist and AI, demonstrating improvements in sensitivity and specificity surpassing that of the individual radiologist and of the standalone AI system. This approach has the potential to improve the screening accuracy of radiologists, is adaptive to the requirements of screening, and could allow for the reduction of workload ahead of the consensus conference, without discarding the generalised knowledge of radiologists. FUNDING: Vara. 2022-07 /pmc/articles/PMC9839981/ /pubmed/35750400 http://dx.doi.org/10.1016/S2589-7500(22)00070-X Text en https://creativecommons.org/licenses/by/4.0/This is an Open Access article under the CC BY 4.0 license
spellingShingle Article
Leibig, Christian
Brehmer, Moritz
Bunk, Stefan
Byng, Danalyn
Pinker, Katja
Umutlu, Lale
Combining the strengths of radiologists and AI for breast cancer screening: a retrospective analysis
title Combining the strengths of radiologists and AI for breast cancer screening: a retrospective analysis
title_full Combining the strengths of radiologists and AI for breast cancer screening: a retrospective analysis
title_fullStr Combining the strengths of radiologists and AI for breast cancer screening: a retrospective analysis
title_full_unstemmed Combining the strengths of radiologists and AI for breast cancer screening: a retrospective analysis
title_short Combining the strengths of radiologists and AI for breast cancer screening: a retrospective analysis
title_sort combining the strengths of radiologists and ai for breast cancer screening: a retrospective analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9839981/
https://www.ncbi.nlm.nih.gov/pubmed/35750400
http://dx.doi.org/10.1016/S2589-7500(22)00070-X
work_keys_str_mv AT leibigchristian combiningthestrengthsofradiologistsandaiforbreastcancerscreeningaretrospectiveanalysis
AT brehmermoritz combiningthestrengthsofradiologistsandaiforbreastcancerscreeningaretrospectiveanalysis
AT bunkstefan combiningthestrengthsofradiologistsandaiforbreastcancerscreeningaretrospectiveanalysis
AT byngdanalyn combiningthestrengthsofradiologistsandaiforbreastcancerscreeningaretrospectiveanalysis
AT pinkerkatja combiningthestrengthsofradiologistsandaiforbreastcancerscreeningaretrospectiveanalysis
AT umutlulale combiningthestrengthsofradiologistsandaiforbreastcancerscreeningaretrospectiveanalysis