Cargando…
Clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis
Artificial intelligence (AI) algorithms evaluating [supine] chest radiographs ([S]CXRs) have remarkably increased in number recently. Since training and validation are often performed on subsets of the same overall dataset, external validation is mandatory to reproduce results and reveal potential t...
Autores principales: | , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9329327/ https://www.ncbi.nlm.nih.gov/pubmed/35896763 http://dx.doi.org/10.1038/s41598-022-16514-7 |
_version_ | 1784757896221818880 |
---|---|
author | Rudolph, Jan Schachtner, Balthasar Fink, Nicola Koliogiannis, Vanessa Schwarze, Vincent Goller, Sophia Trappmann, Lena Hoppe, Boj F. Mansour, Nabeel Fischer, Maximilian Ben Khaled, Najib Jörgens, Maximilian Dinkel, Julien Kunz, Wolfgang G. Ricke, Jens Ingrisch, Michael Sabel, Bastian O. Rueckel, Johannes |
author_facet | Rudolph, Jan Schachtner, Balthasar Fink, Nicola Koliogiannis, Vanessa Schwarze, Vincent Goller, Sophia Trappmann, Lena Hoppe, Boj F. Mansour, Nabeel Fischer, Maximilian Ben Khaled, Najib Jörgens, Maximilian Dinkel, Julien Kunz, Wolfgang G. Ricke, Jens Ingrisch, Michael Sabel, Bastian O. Rueckel, Johannes |
author_sort | Rudolph, Jan |
collection | PubMed |
description | Artificial intelligence (AI) algorithms evaluating [supine] chest radiographs ([S]CXRs) have remarkably increased in number recently. Since training and validation are often performed on subsets of the same overall dataset, external validation is mandatory to reproduce results and reveal potential training errors. We applied a multicohort benchmarking to the publicly accessible (S)CXR analyzing AI algorithm CheXNet, comprising three clinically relevant study cohorts which differ in patient positioning ([S]CXRs), the applied reference standards (CT-/[S]CXR-based) and the possibility to also compare algorithm classification with different medical experts’ reading performance. The study cohorts include [1] a cohort, characterized by 563 CXRs acquired in the emergency unit that were evaluated by 9 readers (radiologists and non-radiologists) in terms of 4 common pathologies, [2] a collection of 6,248 SCXRs annotated by radiologists in terms of pneumothorax presence, its size and presence of inserted thoracic tube material which allowed for subgroup and confounding bias analysis and [3] a cohort consisting of 166 patients with SCXRs that were evaluated by radiologists for underlying causes of basal lung opacities, all of those cases having been correlated to a timely acquired computed tomography scan (SCXR and CT within < 90 min). CheXNet non-significantly exceeded the radiology resident (RR) consensus in the detection of suspicious lung nodules (cohort [1], AUC AI/RR: 0.851/0.839, p = 0.793) and the radiological readers in the detection of basal pneumonia (cohort [3], AUC AI/reader consensus: 0.825/0.782, p = 0.390) and basal pleural effusion (cohort [3], AUC AI/reader consensus: 0.762/0.710, p = 0.336) in SCXR, partly with AUC values higher than originally published (“Nodule”: 0.780, “Infiltration”: 0.735, “Effusion”: 0.864). The classifier “Infiltration” turned out to be very dependent on patient positioning (best in CXR, worst in SCXR). The pneumothorax SCXR cohort [2] revealed poor algorithm performance in CXRs without inserted thoracic material and in the detection of small pneumothoraces, which can be explained by a known systematic confounding error in the algorithm training process. The benefit of clinically relevant external validation is demonstrated by the differences in algorithm performance as compared to the original publication. Our multi-cohort benchmarking finally enables the consideration of confounders, different reference standards and patient positioning as well as the AI performance comparison with differentially qualified medical readers. |
format | Online Article Text |
id | pubmed-9329327 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-93293272022-07-29 Clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis Rudolph, Jan Schachtner, Balthasar Fink, Nicola Koliogiannis, Vanessa Schwarze, Vincent Goller, Sophia Trappmann, Lena Hoppe, Boj F. Mansour, Nabeel Fischer, Maximilian Ben Khaled, Najib Jörgens, Maximilian Dinkel, Julien Kunz, Wolfgang G. Ricke, Jens Ingrisch, Michael Sabel, Bastian O. Rueckel, Johannes Sci Rep Article Artificial intelligence (AI) algorithms evaluating [supine] chest radiographs ([S]CXRs) have remarkably increased in number recently. Since training and validation are often performed on subsets of the same overall dataset, external validation is mandatory to reproduce results and reveal potential training errors. We applied a multicohort benchmarking to the publicly accessible (S)CXR analyzing AI algorithm CheXNet, comprising three clinically relevant study cohorts which differ in patient positioning ([S]CXRs), the applied reference standards (CT-/[S]CXR-based) and the possibility to also compare algorithm classification with different medical experts’ reading performance. The study cohorts include [1] a cohort, characterized by 563 CXRs acquired in the emergency unit that were evaluated by 9 readers (radiologists and non-radiologists) in terms of 4 common pathologies, [2] a collection of 6,248 SCXRs annotated by radiologists in terms of pneumothorax presence, its size and presence of inserted thoracic tube material which allowed for subgroup and confounding bias analysis and [3] a cohort consisting of 166 patients with SCXRs that were evaluated by radiologists for underlying causes of basal lung opacities, all of those cases having been correlated to a timely acquired computed tomography scan (SCXR and CT within < 90 min). CheXNet non-significantly exceeded the radiology resident (RR) consensus in the detection of suspicious lung nodules (cohort [1], AUC AI/RR: 0.851/0.839, p = 0.793) and the radiological readers in the detection of basal pneumonia (cohort [3], AUC AI/reader consensus: 0.825/0.782, p = 0.390) and basal pleural effusion (cohort [3], AUC AI/reader consensus: 0.762/0.710, p = 0.336) in SCXR, partly with AUC values higher than originally published (“Nodule”: 0.780, “Infiltration”: 0.735, “Effusion”: 0.864). The classifier “Infiltration” turned out to be very dependent on patient positioning (best in CXR, worst in SCXR). The pneumothorax SCXR cohort [2] revealed poor algorithm performance in CXRs without inserted thoracic material and in the detection of small pneumothoraces, which can be explained by a known systematic confounding error in the algorithm training process. The benefit of clinically relevant external validation is demonstrated by the differences in algorithm performance as compared to the original publication. Our multi-cohort benchmarking finally enables the consideration of confounders, different reference standards and patient positioning as well as the AI performance comparison with differentially qualified medical readers. Nature Publishing Group UK 2022-07-27 /pmc/articles/PMC9329327/ /pubmed/35896763 http://dx.doi.org/10.1038/s41598-022-16514-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Rudolph, Jan Schachtner, Balthasar Fink, Nicola Koliogiannis, Vanessa Schwarze, Vincent Goller, Sophia Trappmann, Lena Hoppe, Boj F. Mansour, Nabeel Fischer, Maximilian Ben Khaled, Najib Jörgens, Maximilian Dinkel, Julien Kunz, Wolfgang G. Ricke, Jens Ingrisch, Michael Sabel, Bastian O. Rueckel, Johannes Clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis |
title | Clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis |
title_full | Clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis |
title_fullStr | Clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis |
title_full_unstemmed | Clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis |
title_short | Clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis |
title_sort | clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9329327/ https://www.ncbi.nlm.nih.gov/pubmed/35896763 http://dx.doi.org/10.1038/s41598-022-16514-7 |
work_keys_str_mv | AT rudolphjan clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis AT schachtnerbalthasar clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis AT finknicola clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis AT koliogiannisvanessa clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis AT schwarzevincent clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis AT gollersophia clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis AT trappmannlena clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis AT hoppebojf clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis AT mansournabeel clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis AT fischermaximilian clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis AT benkhalednajib clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis AT jorgensmaximilian clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis AT dinkeljulien clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis AT kunzwolfgangg clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis AT rickejens clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis AT ingrischmichael clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis AT sabelbastiano clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis AT rueckeljohannes clinicallyfocusedmulticohortbenchmarkingasatoolforexternalvalidationofartificialintelligencealgorithmperformanceinbasicchestradiographyanalysis |