Cargando…
Performance of a Chest Radiograph AI Diagnostic Tool for COVID-19: A Prospective Observational Study
PURPOSE: To conduct a prospective observational study across 12 U.S. hospitals to evaluate real-time performance of an interpretable artificial intelligence (AI) model to detect COVID-19 on chest radiographs. MATERIALS AND METHODS: A total of 95 363 chest radiographs were included in model training,...
Autores principales: | , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Radiological Society of North America
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9344211/ https://www.ncbi.nlm.nih.gov/pubmed/35923381 http://dx.doi.org/10.1148/ryai.210217 |
_version_ | 1784761169483923456 |
---|---|
author | Sun, Ju Peng, Le Li, Taihui Adila, Dyah Zaiman, Zach Melton-Meaux, Genevieve B. Ingraham, Nicholas E. Murray, Eric Boley, Daniel Switzer, Sean Burns, John L. Huang, Kun Allen, Tadashi Steenburg, Scott D. Gichoya, Judy Wawira Kummerfeld, Erich Tignanelli, Christopher J. |
author_facet | Sun, Ju Peng, Le Li, Taihui Adila, Dyah Zaiman, Zach Melton-Meaux, Genevieve B. Ingraham, Nicholas E. Murray, Eric Boley, Daniel Switzer, Sean Burns, John L. Huang, Kun Allen, Tadashi Steenburg, Scott D. Gichoya, Judy Wawira Kummerfeld, Erich Tignanelli, Christopher J. |
author_sort | Sun, Ju |
collection | PubMed |
description | PURPOSE: To conduct a prospective observational study across 12 U.S. hospitals to evaluate real-time performance of an interpretable artificial intelligence (AI) model to detect COVID-19 on chest radiographs. MATERIALS AND METHODS: A total of 95 363 chest radiographs were included in model training, external validation, and real-time validation. The model was deployed as a clinical decision support system, and performance was prospectively evaluated. There were 5335 total real-time predictions and a COVID-19 prevalence of 4.8% (258 of 5335). Model performance was assessed with use of receiver operating characteristic analysis, precision-recall curves, and F1 score. Logistic regression was used to evaluate the association of race and sex with AI model diagnostic accuracy. To compare model accuracy with the performance of board-certified radiologists, a third dataset of 1638 images was read independently by two radiologists. RESULTS: Participants positive for COVID-19 had higher COVID-19 diagnostic scores than participants negative for COVID-19 (median, 0.1 [IQR, 0.0–0.8] vs 0.0 [IQR, 0.0–0.1], respectively; P < .001). Real-time model performance was unchanged over 19 weeks of implementation (area under the receiver operating characteristic curve, 0.70; 95% CI: 0.66, 0.73). Model sensitivity was higher in men than women (P = .01), whereas model specificity was higher in women (P = .001). Sensitivity was higher for Asian (P = .002) and Black (P = .046) participants compared with White participants. The COVID-19 AI diagnostic system had worse accuracy (63.5% correct) compared with radiologist predictions (radiologist 1 = 67.8% correct, radiologist 2 = 68.6% correct; McNemar P < .001 for both). CONCLUSION: AI-based tools have not yet reached full diagnostic potential for COVID-19 and underperform compared with radiologist prediction. Keywords: Diagnosis, Classification, Application Domain, Infection, Lung Supplemental material is available for this article.. © RSNA, 2022 |
format | Online Article Text |
id | pubmed-9344211 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Radiological Society of North America |
record_format | MEDLINE/PubMed |
spelling | pubmed-93442112022-08-02 Performance of a Chest Radiograph AI Diagnostic Tool for COVID-19: A Prospective Observational Study Sun, Ju Peng, Le Li, Taihui Adila, Dyah Zaiman, Zach Melton-Meaux, Genevieve B. Ingraham, Nicholas E. Murray, Eric Boley, Daniel Switzer, Sean Burns, John L. Huang, Kun Allen, Tadashi Steenburg, Scott D. Gichoya, Judy Wawira Kummerfeld, Erich Tignanelli, Christopher J. Radiol Artif Intell Original Research PURPOSE: To conduct a prospective observational study across 12 U.S. hospitals to evaluate real-time performance of an interpretable artificial intelligence (AI) model to detect COVID-19 on chest radiographs. MATERIALS AND METHODS: A total of 95 363 chest radiographs were included in model training, external validation, and real-time validation. The model was deployed as a clinical decision support system, and performance was prospectively evaluated. There were 5335 total real-time predictions and a COVID-19 prevalence of 4.8% (258 of 5335). Model performance was assessed with use of receiver operating characteristic analysis, precision-recall curves, and F1 score. Logistic regression was used to evaluate the association of race and sex with AI model diagnostic accuracy. To compare model accuracy with the performance of board-certified radiologists, a third dataset of 1638 images was read independently by two radiologists. RESULTS: Participants positive for COVID-19 had higher COVID-19 diagnostic scores than participants negative for COVID-19 (median, 0.1 [IQR, 0.0–0.8] vs 0.0 [IQR, 0.0–0.1], respectively; P < .001). Real-time model performance was unchanged over 19 weeks of implementation (area under the receiver operating characteristic curve, 0.70; 95% CI: 0.66, 0.73). Model sensitivity was higher in men than women (P = .01), whereas model specificity was higher in women (P = .001). Sensitivity was higher for Asian (P = .002) and Black (P = .046) participants compared with White participants. The COVID-19 AI diagnostic system had worse accuracy (63.5% correct) compared with radiologist predictions (radiologist 1 = 67.8% correct, radiologist 2 = 68.6% correct; McNemar P < .001 for both). CONCLUSION: AI-based tools have not yet reached full diagnostic potential for COVID-19 and underperform compared with radiologist prediction. Keywords: Diagnosis, Classification, Application Domain, Infection, Lung Supplemental material is available for this article.. © RSNA, 2022 Radiological Society of North America 2022-06-01 /pmc/articles/PMC9344211/ /pubmed/35923381 http://dx.doi.org/10.1148/ryai.210217 Text en © 2022 by the Radiological Society of North America, Inc. This article is made available via the PMC Open Access Subset for unrestricted re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the COVID-19 pandemic or until permissions are revoked in writing. Upon expiration of these permissions, PMC is granted a perpetual license to make this article available via PMC and Europe PMC, consistent with existing copyright protections. |
spellingShingle | Original Research Sun, Ju Peng, Le Li, Taihui Adila, Dyah Zaiman, Zach Melton-Meaux, Genevieve B. Ingraham, Nicholas E. Murray, Eric Boley, Daniel Switzer, Sean Burns, John L. Huang, Kun Allen, Tadashi Steenburg, Scott D. Gichoya, Judy Wawira Kummerfeld, Erich Tignanelli, Christopher J. Performance of a Chest Radiograph AI Diagnostic Tool for COVID-19: A Prospective Observational Study |
title | Performance of a Chest Radiograph AI Diagnostic Tool for COVID-19: A
Prospective Observational Study |
title_full | Performance of a Chest Radiograph AI Diagnostic Tool for COVID-19: A
Prospective Observational Study |
title_fullStr | Performance of a Chest Radiograph AI Diagnostic Tool for COVID-19: A
Prospective Observational Study |
title_full_unstemmed | Performance of a Chest Radiograph AI Diagnostic Tool for COVID-19: A
Prospective Observational Study |
title_short | Performance of a Chest Radiograph AI Diagnostic Tool for COVID-19: A
Prospective Observational Study |
title_sort | performance of a chest radiograph ai diagnostic tool for covid-19: a
prospective observational study |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9344211/ https://www.ncbi.nlm.nih.gov/pubmed/35923381 http://dx.doi.org/10.1148/ryai.210217 |
work_keys_str_mv | AT sunju performanceofachestradiographaidiagnostictoolforcovid19aprospectiveobservationalstudy AT pengle performanceofachestradiographaidiagnostictoolforcovid19aprospectiveobservationalstudy AT litaihui performanceofachestradiographaidiagnostictoolforcovid19aprospectiveobservationalstudy AT adiladyah performanceofachestradiographaidiagnostictoolforcovid19aprospectiveobservationalstudy AT zaimanzach performanceofachestradiographaidiagnostictoolforcovid19aprospectiveobservationalstudy AT meltonmeauxgenevieveb performanceofachestradiographaidiagnostictoolforcovid19aprospectiveobservationalstudy AT ingrahamnicholase performanceofachestradiographaidiagnostictoolforcovid19aprospectiveobservationalstudy AT murrayeric performanceofachestradiographaidiagnostictoolforcovid19aprospectiveobservationalstudy AT boleydaniel performanceofachestradiographaidiagnostictoolforcovid19aprospectiveobservationalstudy AT switzersean performanceofachestradiographaidiagnostictoolforcovid19aprospectiveobservationalstudy AT burnsjohnl performanceofachestradiographaidiagnostictoolforcovid19aprospectiveobservationalstudy AT huangkun performanceofachestradiographaidiagnostictoolforcovid19aprospectiveobservationalstudy AT allentadashi performanceofachestradiographaidiagnostictoolforcovid19aprospectiveobservationalstudy AT steenburgscottd performanceofachestradiographaidiagnostictoolforcovid19aprospectiveobservationalstudy AT gichoyajudywawira performanceofachestradiographaidiagnostictoolforcovid19aprospectiveobservationalstudy AT kummerfelderich performanceofachestradiographaidiagnostictoolforcovid19aprospectiveobservationalstudy AT tignanellichristopherj performanceofachestradiographaidiagnostictoolforcovid19aprospectiveobservationalstudy |