Cargando…

Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest radiography

OBJECTIVE: To examine whether incorrect AI results impact radiologist performance, and if so, whether human factors can be optimized to reduce error. METHODS: Multi-reader design, 6 radiologists interpreted 90 identical chest radiographs (follow-up CT needed: yes/no) on four occasions (09/20–01/22)....

Descripción completa

Detalles Bibliográficos
Autores principales:	Bernstein, Michael H., Atalay, Michael K., Dibble, Elizabeth H., Maxwell, Aaron W. P., Karam, Adib R., Agarwal, Saurabh, Ward, Robert C., Healey, Terrance T., Baird, Grayson L.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Berlin Heidelberg 2023
Materias:	Chest
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10235827/ https://www.ncbi.nlm.nih.gov/pubmed/37266657 http://dx.doi.org/10.1007/s00330-023-09747-1

_version_	1785052779051483136
author	Bernstein, Michael H. Atalay, Michael K. Dibble, Elizabeth H. Maxwell, Aaron W. P. Karam, Adib R. Agarwal, Saurabh Ward, Robert C. Healey, Terrance T. Baird, Grayson L.
author_facet	Bernstein, Michael H. Atalay, Michael K. Dibble, Elizabeth H. Maxwell, Aaron W. P. Karam, Adib R. Agarwal, Saurabh Ward, Robert C. Healey, Terrance T. Baird, Grayson L.
author_sort	Bernstein, Michael H.
collection	PubMed
description	OBJECTIVE: To examine whether incorrect AI results impact radiologist performance, and if so, whether human factors can be optimized to reduce error. METHODS: Multi-reader design, 6 radiologists interpreted 90 identical chest radiographs (follow-up CT needed: yes/no) on four occasions (09/20–01/22). No AI result was provided for session 1. Sham AI results were provided for sessions 2–4, and AI for 12 cases were manipulated to be incorrect (8 false positives (FP), 4 false negatives (FN)) (0.87 ROC-AUC). In the Delete AI (No Box) condition, radiologists were told AI results would not be saved for the evaluation. In Keep AI (No Box) and Keep AI (Box), radiologists were told results would be saved. In Keep AI (Box), the ostensible AI program visually outlined the region of suspicion. AI results were constant between conditions. RESULTS: Relative to the No AI condition (FN = 2.7%, FP = 51.4%), FN and FPs were higher in the Keep AI (No Box) (FN = 33.0%, FP = 86.0%), Delete AI (No Box) (FN = 26.7%, FP = 80.5%), and Keep AI (Box) (FN = to 20.7%, FP = 80.5%) conditions (all ps < 0.05). FNs were higher in the Keep AI (No Box) condition (33.0%) than in the Keep AI (Box) condition (20.7%) (p = 0.04). FPs were higher in the Keep AI (No Box) (86.0%) condition than in the Delete AI (No Box) condition (80.5%) (p = 0.03). CONCLUSION: Incorrect AI causes radiologists to make incorrect follow-up decisions when they were correct without AI. This effect is mitigated when radiologists believe AI will be deleted from the patient’s file or a box is provided around the region of interest. CLINICAL RELEVANCE STATEMENT: When AI is wrong, radiologists make more errors than they would have without AI. Based on human factors psychology, our manuscript provides evidence for two AI implementation strategies that reduce the deleterious effects of incorrect AI. KEY POINTS: • When AI provided incorrect results, false negative and false positive rates among the radiologists increased. • False positives decreased when AI results were deleted, versus kept, in the patient’s record. • False negatives and false positives decreased when AI visually outlined the region of suspicion. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00330-023-09747-1.
format	Online Article Text
id	pubmed-10235827
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Springer Berlin Heidelberg
record_format	MEDLINE/PubMed
spelling	pubmed-102358272023-06-06 Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest radiography Bernstein, Michael H. Atalay, Michael K. Dibble, Elizabeth H. Maxwell, Aaron W. P. Karam, Adib R. Agarwal, Saurabh Ward, Robert C. Healey, Terrance T. Baird, Grayson L. Eur Radiol Chest OBJECTIVE: To examine whether incorrect AI results impact radiologist performance, and if so, whether human factors can be optimized to reduce error. METHODS: Multi-reader design, 6 radiologists interpreted 90 identical chest radiographs (follow-up CT needed: yes/no) on four occasions (09/20–01/22). No AI result was provided for session 1. Sham AI results were provided for sessions 2–4, and AI for 12 cases were manipulated to be incorrect (8 false positives (FP), 4 false negatives (FN)) (0.87 ROC-AUC). In the Delete AI (No Box) condition, radiologists were told AI results would not be saved for the evaluation. In Keep AI (No Box) and Keep AI (Box), radiologists were told results would be saved. In Keep AI (Box), the ostensible AI program visually outlined the region of suspicion. AI results were constant between conditions. RESULTS: Relative to the No AI condition (FN = 2.7%, FP = 51.4%), FN and FPs were higher in the Keep AI (No Box) (FN = 33.0%, FP = 86.0%), Delete AI (No Box) (FN = 26.7%, FP = 80.5%), and Keep AI (Box) (FN = to 20.7%, FP = 80.5%) conditions (all ps < 0.05). FNs were higher in the Keep AI (No Box) condition (33.0%) than in the Keep AI (Box) condition (20.7%) (p = 0.04). FPs were higher in the Keep AI (No Box) (86.0%) condition than in the Delete AI (No Box) condition (80.5%) (p = 0.03). CONCLUSION: Incorrect AI causes radiologists to make incorrect follow-up decisions when they were correct without AI. This effect is mitigated when radiologists believe AI will be deleted from the patient’s file or a box is provided around the region of interest. CLINICAL RELEVANCE STATEMENT: When AI is wrong, radiologists make more errors than they would have without AI. Based on human factors psychology, our manuscript provides evidence for two AI implementation strategies that reduce the deleterious effects of incorrect AI. KEY POINTS: • When AI provided incorrect results, false negative and false positive rates among the radiologists increased. • False positives decreased when AI results were deleted, versus kept, in the patient’s record. • False negatives and false positives decreased when AI visually outlined the region of suspicion. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00330-023-09747-1. Springer Berlin Heidelberg 2023-06-02 2023 /pmc/articles/PMC10235827/ /pubmed/37266657 http://dx.doi.org/10.1007/s00330-023-09747-1 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Chest Bernstein, Michael H. Atalay, Michael K. Dibble, Elizabeth H. Maxwell, Aaron W. P. Karam, Adib R. Agarwal, Saurabh Ward, Robert C. Healey, Terrance T. Baird, Grayson L. Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest radiography
title	Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest radiography
title_full	Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest radiography
title_fullStr	Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest radiography
title_full_unstemmed	Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest radiography
title_short	Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest radiography
title_sort	can incorrect artificial intelligence (ai) results impact radiologists, and if so, what can we do about it? a multi-reader pilot study of lung cancer detection with chest radiography
topic	Chest
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10235827/ https://www.ncbi.nlm.nih.gov/pubmed/37266657 http://dx.doi.org/10.1007/s00330-023-09747-1
work_keys_str_mv	AT bernsteinmichaelh canincorrectartificialintelligenceairesultsimpactradiologistsandifsowhatcanwedoaboutitamultireaderpilotstudyoflungcancerdetectionwithchestradiography AT atalaymichaelk canincorrectartificialintelligenceairesultsimpactradiologistsandifsowhatcanwedoaboutitamultireaderpilotstudyoflungcancerdetectionwithchestradiography AT dibbleelizabethh canincorrectartificialintelligenceairesultsimpactradiologistsandifsowhatcanwedoaboutitamultireaderpilotstudyoflungcancerdetectionwithchestradiography AT maxwellaaronwp canincorrectartificialintelligenceairesultsimpactradiologistsandifsowhatcanwedoaboutitamultireaderpilotstudyoflungcancerdetectionwithchestradiography AT karamadibr canincorrectartificialintelligenceairesultsimpactradiologistsandifsowhatcanwedoaboutitamultireaderpilotstudyoflungcancerdetectionwithchestradiography AT agarwalsaurabh canincorrectartificialintelligenceairesultsimpactradiologistsandifsowhatcanwedoaboutitamultireaderpilotstudyoflungcancerdetectionwithchestradiography AT wardrobertc canincorrectartificialintelligenceairesultsimpactradiologistsandifsowhatcanwedoaboutitamultireaderpilotstudyoflungcancerdetectionwithchestradiography AT healeyterrancet canincorrectartificialintelligenceairesultsimpactradiologistsandifsowhatcanwedoaboutitamultireaderpilotstudyoflungcancerdetectionwithchestradiography AT bairdgraysonl canincorrectartificialintelligenceairesultsimpactradiologistsandifsowhatcanwedoaboutitamultireaderpilotstudyoflungcancerdetectionwithchestradiography

Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest radiography

Ejemplares similares