Cargando…

Clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis

Artificial intelligence (AI) algorithms evaluating [supine] chest radiographs ([S]CXRs) have remarkably increased in number recently. Since training and validation are often performed on subsets of the same overall dataset, external validation is mandatory to reproduce results and reveal potential t...

Descripción completa

Detalles Bibliográficos
Autores principales: Rudolph, Jan, Schachtner, Balthasar, Fink, Nicola, Koliogiannis, Vanessa, Schwarze, Vincent, Goller, Sophia, Trappmann, Lena, Hoppe, Boj F., Mansour, Nabeel, Fischer, Maximilian, Ben Khaled, Najib, Jörgens, Maximilian, Dinkel, Julien, Kunz, Wolfgang G., Ricke, Jens, Ingrisch, Michael, Sabel, Bastian O., Rueckel, Johannes
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9329327/
https://www.ncbi.nlm.nih.gov/pubmed/35896763
http://dx.doi.org/10.1038/s41598-022-16514-7
_version_ 1784757896221818880
author Rudolph, Jan
Schachtner, Balthasar
Fink, Nicola
Koliogiannis, Vanessa
Schwarze, Vincent
Goller, Sophia
Trappmann, Lena
Hoppe, Boj F.
Mansour, Nabeel
Fischer, Maximilian
Ben Khaled, Najib
Jörgens, Maximilian
Dinkel, Julien
Kunz, Wolfgang G.
Ricke, Jens
Ingrisch, Michael
Sabel, Bastian O.
Rueckel, Johannes
author_facet Rudolph, Jan
Schachtner, Balthasar
Fink, Nicola
Koliogiannis, Vanessa
Schwarze, Vincent
Goller, Sophia
Trappmann, Lena
Hoppe, Boj F.
Mansour, Nabeel
Fischer, Maximilian
Ben Khaled, Najib
Jörgens, Maximilian
Dinkel, Julien
Kunz, Wolfgang G.
Ricke, Jens
Ingrisch, Michael
Sabel, Bastian O.
Rueckel, Johannes
author_sort Rudolph, Jan
collection PubMed
description Artificial intelligence (AI) algorithms evaluating [supine] chest radiographs ([S]CXRs) have remarkably increased in number recently. Since training and validation are often performed on subsets of the same overall dataset, external validation is mandatory to reproduce results and reveal potential training errors. We applied a multicohort benchmarking to the publicly accessible (S)CXR analyzing AI algorithm CheXNet, comprising three clinically relevant study cohorts which differ in patient positioning ([S]CXRs), the applied reference standards (CT-/[S]CXR-based) and the possibility to also compare algorithm classification with different medical experts’ reading performance. The study cohorts include [1] a cohort, characterized by 563 CXRs acquired in the emergency unit that were evaluated by 9 readers (radiologists and non-radiologists) in terms of 4 common pathologies, [2] a collection of 6,248 SCXRs annotated by radiologists in terms of pneumothorax presence, its size and presence of inserted thoracic tube material which allowed for subgroup and confounding bias analysis and [3] a cohort consisting of 166 patients with SCXRs that were evaluated by radiologists for underlying causes of basal lung opacities, all of those cases having been correlated to a timely acquired computed tomography scan (SCXR and CT within < 90 min). CheXNet non-significantly exceeded the radiology resident (RR) consensus in the detection of suspicious lung nodules (cohort [1], AUC AI/RR: 0.851/0.839, p = 0.793) and the radiological readers in the detection of basal pneumonia (cohort [3], AUC AI/reader consensus: 0.825/0.782, p = 0.390) and basal pleural effusion (cohort [3], AUC AI/reader consensus: 0.762/0.710, p = 0.336) in SCXR, partly with AUC values higher than originally published (“Nodule”: 0.780, “Infiltration”: 0.735, “Effusion”: 0.864). The classifier “Infiltration” turned out to be very dependent on patient positioning (best in CXR, worst in SCXR). The pneumothorax SCXR cohort [2] revealed poor algorithm performance in CXRs without inserted thoracic material and in the detection of small pneumothoraces, which can be explained by a known systematic confounding error in the algorithm training process. The benefit of clinically relevant external validation is demonstrated by the differences in algorithm performance as compared to the original publication. Our multi-cohort benchmarking finally enables the consideration of confounders, different reference standards and patient positioning as well as the AI performance comparison with differentially qualified medical readers.
format Online
Article
Text
id pubmed-9329327
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-93293272022-07-29 Clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis Rudolph, Jan Schachtner, Balthasar Fink, Nicola Koliogiannis, Vanessa Schwarze, Vincent Goller, Sophia Trappmann, Lena Hoppe, Boj F. Mansour, Nabeel Fischer, Maximilian Ben Khaled, Najib Jörgens, Maximilian Dinkel, Julien Kunz, Wolfgang G. Ricke, Jens Ingrisch, Michael Sabel, Bastian O. Rueckel, Johannes Sci Rep Article Artificial intelligence (AI) algorithms evaluating [supine] chest radiographs ([S]CXRs) have remarkably increased in number recently. Since training and validation are often performed on subsets of the same overall dataset, external validation is mandatory to reproduce results and reveal potential training errors. We applied a multicohort benchmarking to the publicly accessible (S)CXR analyzing AI algorithm CheXNet, comprising three clinically relevant study cohorts which differ in patient positioning ([S]CXRs), the applied reference standards (CT-/[S]CXR-based) and the possibility to also compare algorithm classification with different medical experts’ reading performance. The study cohorts include [1] a cohort, characterized by 563 CXRs acquired in the emergency unit that were evaluated by 9 readers (radiologists and non-radiologists) in terms of 4 common pathologies, [2] a collection of 6,248 SCXRs annotated by radiologists in terms of pneumothorax presence, its size and presence of inserted thoracic tube material which allowed for subgroup and confounding bias analysis and [3] a cohort consisting of 166 patients with SCXRs that were evaluated by radiologists for underlying causes of basal lung opacities, all of those cases having been correlated to a timely acquired computed tomography scan (SCXR and CT within < 90 min). CheXNet non-significantly exceeded the radiology resident (RR) consensus in the detection of suspicious lung nodules (cohort [1], AUC AI/RR: 0.851/0.839, p = 0.793) and the radiological readers in the detection of basal pneumonia (cohort [3], AUC AI/reader consensus: 0.825/0.782, p = 0.390) and basal pleural effusion (cohort [3], AUC AI/reader consensus: 0.762/0.710, p = 0.336) in SCXR, partly with AUC values higher than originally published (“Nodule”: 0.780, “Infiltration”: 0.735, “Effusion”: 0.864). The classifier “Infiltration” turned out to be very dependent on patient positioning (best in CXR, worst in SCXR). The pneumothorax SCXR cohort [2] revealed poor algorithm performance in CXRs without inserted thoracic material and in the detection of small pneumothoraces, which can be explained by a known systematic confounding error in the algorithm training process. The benefit of clinically relevant external validation is demonstrated by the differences in algorithm performance as compared to the original publication. Our multi-cohort benchmarking finally enables the consideration of confounders, different reference standards and patient positioning as well as the AI performance comparison with differentially qualified medical readers. Nature Publishing Group UK 2022-07-27 /pmc/articles/PMC9329327/ /pubmed/35896763 http://dx.doi.org/10.1038/s41598-022-16514-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Rudolph, Jan
Schachtner, Balthasar
Fink, Nicola
Koliogiannis, Vanessa
Schwarze, Vincent
Goller, Sophia
Trappmann, Lena
Hoppe, Boj F.
Mansour, Nabeel
Fischer, Maximilian
Ben Khaled, Najib
Jörgens, Maximilian
Dinkel, Julien
Kunz, Wolfgang G.
Ricke, Jens
Ingrisch, Michael
Sabel, Bastian O.
Rueckel, Johannes
Clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis
title Clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis
title_full Clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis
title_fullStr Clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis
title_full_unstemmed Clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis
title_short Clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis
title_sort clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9329327/
https://www.ncbi.nlm.nih.gov/pubmed/35896763
http://dx.doi.org/10.1038/s41598-022-16514-7
work_keys_str_mv AT rudolphjan clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis
AT schachtnerbalthasar clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis
AT finknicola clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis
AT koliogiannisvanessa clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis
AT schwarzevincent clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis
AT gollersophia clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis
AT trappmannlena clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis
AT hoppebojf clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis
AT mansournabeel clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis
AT fischermaximilian clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis
AT benkhalednajib clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis
AT jorgensmaximilian clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis
AT dinkeljulien clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis
AT kunzwolfgangg clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis
AT rickejens clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis
AT ingrischmichael clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis
AT sabelbastiano clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis
AT rueckeljohannes clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis