Cargando…

Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models

Artificial intelligence models match or exceed dermatologists in melanoma image classification. Less is known about their robustness against real-world variations, and clinicians may incorrectly assume that a model with an acceptable area under the receiver operating characteristic curve or related...

Descripción completa

Detalles Bibliográficos
Autores principales: Young, Albert T., Fernandez, Kristen, Pfau, Jacob, Reddy, Rasika, Cao, Nhat Anh, von Franque, Max Y., Johal, Arjun, Wu, Benjamin V., Wu, Rachel R., Chen, Jennifer Y., Fadadu, Raj P., Vasquez, Juan A., Tam, Andrew, Keiser, Michael J., Wei, Maria L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820258/
https://www.ncbi.nlm.nih.gov/pubmed/33479460
http://dx.doi.org/10.1038/s41746-020-00380-6
_version_ 1783639171328376832
author Young, Albert T.
Fernandez, Kristen
Pfau, Jacob
Reddy, Rasika
Cao, Nhat Anh
von Franque, Max Y.
Johal, Arjun
Wu, Benjamin V.
Wu, Rachel R.
Chen, Jennifer Y.
Fadadu, Raj P.
Vasquez, Juan A.
Tam, Andrew
Keiser, Michael J.
Wei, Maria L.
author_facet Young, Albert T.
Fernandez, Kristen
Pfau, Jacob
Reddy, Rasika
Cao, Nhat Anh
von Franque, Max Y.
Johal, Arjun
Wu, Benjamin V.
Wu, Rachel R.
Chen, Jennifer Y.
Fadadu, Raj P.
Vasquez, Juan A.
Tam, Andrew
Keiser, Michael J.
Wei, Maria L.
author_sort Young, Albert T.
collection PubMed
description Artificial intelligence models match or exceed dermatologists in melanoma image classification. Less is known about their robustness against real-world variations, and clinicians may incorrectly assume that a model with an acceptable area under the receiver operating characteristic curve or related performance metric is ready for clinical use. Here, we systematically assessed the performance of dermatologist-level convolutional neural networks (CNNs) on real-world non-curated images by applying computational “stress tests”. Our goal was to create a proxy environment in which to comprehensively test the generalizability of off-the-shelf CNNs developed without training or evaluation protocols specific to individual clinics. We found inconsistent predictions on images captured repeatedly in the same setting or subjected to simple transformations (e.g., rotation). Such transformations resulted in false positive or negative predictions for 6.5–22% of skin lesions across test datasets. Our findings indicate that models meeting conventionally reported metrics need further validation with computational stress tests to assess clinic readiness.
format Online
Article
Text
id pubmed-7820258
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-78202582021-01-28 Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models Young, Albert T. Fernandez, Kristen Pfau, Jacob Reddy, Rasika Cao, Nhat Anh von Franque, Max Y. Johal, Arjun Wu, Benjamin V. Wu, Rachel R. Chen, Jennifer Y. Fadadu, Raj P. Vasquez, Juan A. Tam, Andrew Keiser, Michael J. Wei, Maria L. NPJ Digit Med Article Artificial intelligence models match or exceed dermatologists in melanoma image classification. Less is known about their robustness against real-world variations, and clinicians may incorrectly assume that a model with an acceptable area under the receiver operating characteristic curve or related performance metric is ready for clinical use. Here, we systematically assessed the performance of dermatologist-level convolutional neural networks (CNNs) on real-world non-curated images by applying computational “stress tests”. Our goal was to create a proxy environment in which to comprehensively test the generalizability of off-the-shelf CNNs developed without training or evaluation protocols specific to individual clinics. We found inconsistent predictions on images captured repeatedly in the same setting or subjected to simple transformations (e.g., rotation). Such transformations resulted in false positive or negative predictions for 6.5–22% of skin lesions across test datasets. Our findings indicate that models meeting conventionally reported metrics need further validation with computational stress tests to assess clinic readiness. Nature Publishing Group UK 2021-01-21 /pmc/articles/PMC7820258/ /pubmed/33479460 http://dx.doi.org/10.1038/s41746-020-00380-6 Text en © This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Young, Albert T.
Fernandez, Kristen
Pfau, Jacob
Reddy, Rasika
Cao, Nhat Anh
von Franque, Max Y.
Johal, Arjun
Wu, Benjamin V.
Wu, Rachel R.
Chen, Jennifer Y.
Fadadu, Raj P.
Vasquez, Juan A.
Tam, Andrew
Keiser, Michael J.
Wei, Maria L.
Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models
title Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models
title_full Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models
title_fullStr Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models
title_full_unstemmed Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models
title_short Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models
title_sort stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820258/
https://www.ncbi.nlm.nih.gov/pubmed/33479460
http://dx.doi.org/10.1038/s41746-020-00380-6
work_keys_str_mv AT youngalbertt stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT fernandezkristen stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT pfaujacob stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT reddyrasika stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT caonhatanh stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT vonfranquemaxy stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT johalarjun stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT wubenjaminv stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT wurachelr stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT chenjennifery stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT fadadurajp stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT vasquezjuana stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT tamandrew stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT keisermichaelj stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT weimarial stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels