Cargando…

Effects of Expert-Determined Reference Standards in Evaluating the Diagnostic Performance of a Deep Learning Model: A Malignant Lung Nodule Detection Task on Chest Radiographs

OBJECTIVE: Little is known about the effects of using different expert-determined reference standards when evaluating the performance of deep learning-based automatic detection (DLAD) models and their added value to radiologists. We assessed the concordance of expert-determined standards with a clin...

Descripción completa

Detalles Bibliográficos
Autores principales:	Huh, Jung Eun, Lee, Jong Hyuk, Hwang, Eui Jin, Park, Chang Min
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	The Korean Society of Radiology 2023
Materias:	Thoracic Imaging
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9892220/ https://www.ncbi.nlm.nih.gov/pubmed/36725356 http://dx.doi.org/10.3348/kjr.2022.0548

_version_	1784881298585681920
author	Huh, Jung Eun Lee, Jong Hyuk Hwang, Eui Jin Park, Chang Min
author_facet	Huh, Jung Eun Lee, Jong Hyuk Hwang, Eui Jin Park, Chang Min
author_sort	Huh, Jung Eun
collection	PubMed
description	OBJECTIVE: Little is known about the effects of using different expert-determined reference standards when evaluating the performance of deep learning-based automatic detection (DLAD) models and their added value to radiologists. We assessed the concordance of expert-determined standards with a clinical gold standard (herein, pathological confirmation) and the effects of different expert-determined reference standards on the estimates of radiologists’ diagnostic performance to detect malignant pulmonary nodules on chest radiographs with and without the assistance of a DLAD model. MATERIALS AND METHODS: This study included chest radiographs from 50 patients with pathologically proven lung cancer and 50 controls. Five expert-determined standards were constructed using the interpretations of 10 experts: individual judgment by the most experienced expert, majority vote, consensus judgments of two and three experts, and a latent class analysis (LCA) model. In separate reader tests, additional 10 radiologists independently interpreted the radiographs and then assisted with the DLAD model. Their diagnostic performance was estimated using the clinical gold standard and various expert-determined standards as the reference standard, and the results were compared using the t test with Bonferroni correction. RESULTS: The LCA model (sensitivity, 72.6%; specificity, 100%) was most similar to the clinical gold standard. When expert-determined standards were used, the sensitivities of radiologists and DLAD model alone were overestimated, and their specificities were underestimated (all p-values < 0.05). DLAD assistance diminished the overestimation of sensitivity but exaggerated the underestimation of specificity (all p-values < 0.001). The DLAD model improved sensitivity and specificity to a greater extent when using the clinical gold standard than when using the expert-determined standards (all p-values < 0.001), except for sensitivity with the LCA model (p = 0.094). CONCLUSION: The LCA model was most similar to the clinical gold standard for malignant pulmonary nodule detection on chest radiographs. Expert-determined standards caused bias in measuring the diagnostic performance of the artificial intelligence model.
format	Online Article Text
id	pubmed-9892220
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	The Korean Society of Radiology
record_format	MEDLINE/PubMed
spelling	pubmed-98922202023-02-14 Effects of Expert-Determined Reference Standards in Evaluating the Diagnostic Performance of a Deep Learning Model: A Malignant Lung Nodule Detection Task on Chest Radiographs Huh, Jung Eun Lee, Jong Hyuk Hwang, Eui Jin Park, Chang Min Korean J Radiol Thoracic Imaging OBJECTIVE: Little is known about the effects of using different expert-determined reference standards when evaluating the performance of deep learning-based automatic detection (DLAD) models and their added value to radiologists. We assessed the concordance of expert-determined standards with a clinical gold standard (herein, pathological confirmation) and the effects of different expert-determined reference standards on the estimates of radiologists’ diagnostic performance to detect malignant pulmonary nodules on chest radiographs with and without the assistance of a DLAD model. MATERIALS AND METHODS: This study included chest radiographs from 50 patients with pathologically proven lung cancer and 50 controls. Five expert-determined standards were constructed using the interpretations of 10 experts: individual judgment by the most experienced expert, majority vote, consensus judgments of two and three experts, and a latent class analysis (LCA) model. In separate reader tests, additional 10 radiologists independently interpreted the radiographs and then assisted with the DLAD model. Their diagnostic performance was estimated using the clinical gold standard and various expert-determined standards as the reference standard, and the results were compared using the t test with Bonferroni correction. RESULTS: The LCA model (sensitivity, 72.6%; specificity, 100%) was most similar to the clinical gold standard. When expert-determined standards were used, the sensitivities of radiologists and DLAD model alone were overestimated, and their specificities were underestimated (all p-values < 0.05). DLAD assistance diminished the overestimation of sensitivity but exaggerated the underestimation of specificity (all p-values < 0.001). The DLAD model improved sensitivity and specificity to a greater extent when using the clinical gold standard than when using the expert-determined standards (all p-values < 0.001), except for sensitivity with the LCA model (p = 0.094). CONCLUSION: The LCA model was most similar to the clinical gold standard for malignant pulmonary nodule detection on chest radiographs. Expert-determined standards caused bias in measuring the diagnostic performance of the artificial intelligence model. The Korean Society of Radiology 2023-02 2023-01-18 /pmc/articles/PMC9892220/ /pubmed/36725356 http://dx.doi.org/10.3348/kjr.2022.0548 Text en Copyright © 2023 The Korean Society of Radiology https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0 (https://creativecommons.org/licenses/by-nc/4.0/) ) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Thoracic Imaging Huh, Jung Eun Lee, Jong Hyuk Hwang, Eui Jin Park, Chang Min Effects of Expert-Determined Reference Standards in Evaluating the Diagnostic Performance of a Deep Learning Model: A Malignant Lung Nodule Detection Task on Chest Radiographs
title	Effects of Expert-Determined Reference Standards in Evaluating the Diagnostic Performance of a Deep Learning Model: A Malignant Lung Nodule Detection Task on Chest Radiographs
title_full	Effects of Expert-Determined Reference Standards in Evaluating the Diagnostic Performance of a Deep Learning Model: A Malignant Lung Nodule Detection Task on Chest Radiographs
title_fullStr	Effects of Expert-Determined Reference Standards in Evaluating the Diagnostic Performance of a Deep Learning Model: A Malignant Lung Nodule Detection Task on Chest Radiographs
title_full_unstemmed	Effects of Expert-Determined Reference Standards in Evaluating the Diagnostic Performance of a Deep Learning Model: A Malignant Lung Nodule Detection Task on Chest Radiographs
title_short	Effects of Expert-Determined Reference Standards in Evaluating the Diagnostic Performance of a Deep Learning Model: A Malignant Lung Nodule Detection Task on Chest Radiographs
title_sort	effects of expert-determined reference standards in evaluating the diagnostic performance of a deep learning model: a malignant lung nodule detection task on chest radiographs
topic	Thoracic Imaging
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9892220/ https://www.ncbi.nlm.nih.gov/pubmed/36725356 http://dx.doi.org/10.3348/kjr.2022.0548
work_keys_str_mv	AT huhjungeun effectsofexpertdeterminedreferencestandardsinevaluatingthediagnosticperformanceofadeeplearningmodelamalignantlungnoduledetectiontaskonchestradiographs AT leejonghyuk effectsofexpertdeterminedreferencestandardsinevaluatingthediagnosticperformanceofadeeplearningmodelamalignantlungnoduledetectiontaskonchestradiographs AT hwangeuijin effectsofexpertdeterminedreferencestandardsinevaluatingthediagnosticperformanceofadeeplearningmodelamalignantlungnoduledetectiontaskonchestradiographs AT parkchangmin effectsofexpertdeterminedreferencestandardsinevaluatingthediagnosticperformanceofadeeplearningmodelamalignantlungnoduledetectiontaskonchestradiographs

Effects of Expert-Determined Reference Standards in Evaluating the Diagnostic Performance of a Deep Learning Model: A Malignant Lung Nodule Detection Task on Chest Radiographs

Ejemplares similares