Cargando…

Improving reference standards for validation of AI-based radiography

OBJECTIVE: Demonstrate the importance of combining multiple readers' opinions, in a context-aware manner, when establishing the reference standard for validation of artificial intelligence (AI) applications for, e.g. chest radiographs. By comparing individual readers, majority vote of a panel,...

Descripción completa

Detalles Bibliográficos
Autores principales: Duggan, Gavin E, Reicher, Joshua J, Liu, Yun, Tse, Daniel, Shetty, Shravya
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The British Institute of Radiology. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8248225/
https://www.ncbi.nlm.nih.gov/pubmed/34142868
http://dx.doi.org/10.1259/bjr.20210435
_version_ 1783716679827587072
author Duggan, Gavin E
Reicher, Joshua J
Liu, Yun
Tse, Daniel
Shetty, Shravya
author_facet Duggan, Gavin E
Reicher, Joshua J
Liu, Yun
Tse, Daniel
Shetty, Shravya
author_sort Duggan, Gavin E
collection PubMed
description OBJECTIVE: Demonstrate the importance of combining multiple readers' opinions, in a context-aware manner, when establishing the reference standard for validation of artificial intelligence (AI) applications for, e.g. chest radiographs. By comparing individual readers, majority vote of a panel, and panel-based discussion, we identify methods which maximize interobserver agreement and label reproducibility. METHODS: 1100 frontal chest radiographs were evaluated for 6 findings: airspace opacity, cardiomegaly, pulmonary edema, fracture, nodules, and pneumothorax. Each image was reviewed by six radiologists, first individually and then via asynchronous adjudication (web-based discussion) in two panels of three readers to resolve disagreements within each panel. We quantified the reproducibility of each method by measuring interreader agreement. RESULTS: Panel-based majority vote improved agreement relative to individual readers for all findings. Most disagreements were resolved with two rounds of adjudication, which further improved reproducibility for some findings, particularly reducing misses. Improvements varied across finding categories, with adjudication improving agreement for cardiomegaly, fractures, and pneumothorax. CONCLUSION: The likelihood of interreader agreement, even within panels of US board-certified radiologists, must be considered before reads can be used as a reference standard for validation of proposed AI tools. Agreement and, by extension, reproducibility can be improved by applying majority vote, maximum sensitivity, or asynchronous adjudication for different findings, which supports the development of higher quality clinical research. ADVANCES IN KNOWLEDGE: A panel of three experts is a common technique for establishing reference standards when ground truth is not available for use in AI validation. The manner in which differing opinions are resolved is shown to be important, and has not been previously explored.
format Online
Article
Text
id pubmed-8248225
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher The British Institute of Radiology.
record_format MEDLINE/PubMed
spelling pubmed-82482252021-10-18 Improving reference standards for validation of AI-based radiography Duggan, Gavin E Reicher, Joshua J Liu, Yun Tse, Daniel Shetty, Shravya Br J Radiol Full Paper OBJECTIVE: Demonstrate the importance of combining multiple readers' opinions, in a context-aware manner, when establishing the reference standard for validation of artificial intelligence (AI) applications for, e.g. chest radiographs. By comparing individual readers, majority vote of a panel, and panel-based discussion, we identify methods which maximize interobserver agreement and label reproducibility. METHODS: 1100 frontal chest radiographs were evaluated for 6 findings: airspace opacity, cardiomegaly, pulmonary edema, fracture, nodules, and pneumothorax. Each image was reviewed by six radiologists, first individually and then via asynchronous adjudication (web-based discussion) in two panels of three readers to resolve disagreements within each panel. We quantified the reproducibility of each method by measuring interreader agreement. RESULTS: Panel-based majority vote improved agreement relative to individual readers for all findings. Most disagreements were resolved with two rounds of adjudication, which further improved reproducibility for some findings, particularly reducing misses. Improvements varied across finding categories, with adjudication improving agreement for cardiomegaly, fractures, and pneumothorax. CONCLUSION: The likelihood of interreader agreement, even within panels of US board-certified radiologists, must be considered before reads can be used as a reference standard for validation of proposed AI tools. Agreement and, by extension, reproducibility can be improved by applying majority vote, maximum sensitivity, or asynchronous adjudication for different findings, which supports the development of higher quality clinical research. ADVANCES IN KNOWLEDGE: A panel of three experts is a common technique for establishing reference standards when ground truth is not available for use in AI validation. The manner in which differing opinions are resolved is shown to be important, and has not been previously explored. The British Institute of Radiology. 2021-07-01 2021-06-17 /pmc/articles/PMC8248225/ /pubmed/34142868 http://dx.doi.org/10.1259/bjr.20210435 Text en © 2021 The Authors. Published by the British Institute of Radiology https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 Unported License http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.
spellingShingle Full Paper
Duggan, Gavin E
Reicher, Joshua J
Liu, Yun
Tse, Daniel
Shetty, Shravya
Improving reference standards for validation of AI-based radiography
title Improving reference standards for validation of AI-based radiography
title_full Improving reference standards for validation of AI-based radiography
title_fullStr Improving reference standards for validation of AI-based radiography
title_full_unstemmed Improving reference standards for validation of AI-based radiography
title_short Improving reference standards for validation of AI-based radiography
title_sort improving reference standards for validation of ai-based radiography
topic Full Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8248225/
https://www.ncbi.nlm.nih.gov/pubmed/34142868
http://dx.doi.org/10.1259/bjr.20210435
work_keys_str_mv AT duggangavine improvingreferencestandardsforvalidationofaibasedradiography
AT reicherjoshuaj improvingreferencestandardsforvalidationofaibasedradiography
AT liuyun improvingreferencestandardsforvalidationofaibasedradiography
AT tsedaniel improvingreferencestandardsforvalidationofaibasedradiography
AT shettyshravya improvingreferencestandardsforvalidationofaibasedradiography