Cargando…
Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models
Artificial intelligence models match or exceed dermatologists in melanoma image classification. Less is known about their robustness against real-world variations, and clinicians may incorrectly assume that a model with an acceptable area under the receiver operating characteristic curve or related...
Autores principales: | , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820258/ https://www.ncbi.nlm.nih.gov/pubmed/33479460 http://dx.doi.org/10.1038/s41746-020-00380-6 |
_version_ | 1783639171328376832 |
---|---|
author | Young, Albert T. Fernandez, Kristen Pfau, Jacob Reddy, Rasika Cao, Nhat Anh von Franque, Max Y. Johal, Arjun Wu, Benjamin V. Wu, Rachel R. Chen, Jennifer Y. Fadadu, Raj P. Vasquez, Juan A. Tam, Andrew Keiser, Michael J. Wei, Maria L. |
author_facet | Young, Albert T. Fernandez, Kristen Pfau, Jacob Reddy, Rasika Cao, Nhat Anh von Franque, Max Y. Johal, Arjun Wu, Benjamin V. Wu, Rachel R. Chen, Jennifer Y. Fadadu, Raj P. Vasquez, Juan A. Tam, Andrew Keiser, Michael J. Wei, Maria L. |
author_sort | Young, Albert T. |
collection | PubMed |
description | Artificial intelligence models match or exceed dermatologists in melanoma image classification. Less is known about their robustness against real-world variations, and clinicians may incorrectly assume that a model with an acceptable area under the receiver operating characteristic curve or related performance metric is ready for clinical use. Here, we systematically assessed the performance of dermatologist-level convolutional neural networks (CNNs) on real-world non-curated images by applying computational “stress tests”. Our goal was to create a proxy environment in which to comprehensively test the generalizability of off-the-shelf CNNs developed without training or evaluation protocols specific to individual clinics. We found inconsistent predictions on images captured repeatedly in the same setting or subjected to simple transformations (e.g., rotation). Such transformations resulted in false positive or negative predictions for 6.5–22% of skin lesions across test datasets. Our findings indicate that models meeting conventionally reported metrics need further validation with computational stress tests to assess clinic readiness. |
format | Online Article Text |
id | pubmed-7820258 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-78202582021-01-28 Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models Young, Albert T. Fernandez, Kristen Pfau, Jacob Reddy, Rasika Cao, Nhat Anh von Franque, Max Y. Johal, Arjun Wu, Benjamin V. Wu, Rachel R. Chen, Jennifer Y. Fadadu, Raj P. Vasquez, Juan A. Tam, Andrew Keiser, Michael J. Wei, Maria L. NPJ Digit Med Article Artificial intelligence models match or exceed dermatologists in melanoma image classification. Less is known about their robustness against real-world variations, and clinicians may incorrectly assume that a model with an acceptable area under the receiver operating characteristic curve or related performance metric is ready for clinical use. Here, we systematically assessed the performance of dermatologist-level convolutional neural networks (CNNs) on real-world non-curated images by applying computational “stress tests”. Our goal was to create a proxy environment in which to comprehensively test the generalizability of off-the-shelf CNNs developed without training or evaluation protocols specific to individual clinics. We found inconsistent predictions on images captured repeatedly in the same setting or subjected to simple transformations (e.g., rotation). Such transformations resulted in false positive or negative predictions for 6.5–22% of skin lesions across test datasets. Our findings indicate that models meeting conventionally reported metrics need further validation with computational stress tests to assess clinic readiness. Nature Publishing Group UK 2021-01-21 /pmc/articles/PMC7820258/ /pubmed/33479460 http://dx.doi.org/10.1038/s41746-020-00380-6 Text en © This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Young, Albert T. Fernandez, Kristen Pfau, Jacob Reddy, Rasika Cao, Nhat Anh von Franque, Max Y. Johal, Arjun Wu, Benjamin V. Wu, Rachel R. Chen, Jennifer Y. Fadadu, Raj P. Vasquez, Juan A. Tam, Andrew Keiser, Michael J. Wei, Maria L. Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models |
title | Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models |
title_full | Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models |
title_fullStr | Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models |
title_full_unstemmed | Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models |
title_short | Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models |
title_sort | stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820258/ https://www.ncbi.nlm.nih.gov/pubmed/33479460 http://dx.doi.org/10.1038/s41746-020-00380-6 |
work_keys_str_mv | AT youngalbertt stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT fernandezkristen stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT pfaujacob stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT reddyrasika stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT caonhatanh stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT vonfranquemaxy stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT johalarjun stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT wubenjaminv stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT wurachelr stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT chenjennifery stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT fadadurajp stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT vasquezjuana stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT tamandrew stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT keisermichaelj stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT weimarial stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels |