Cargando…
Sources of performance variability in deep learning-based polyp detection
PURPOSE: Validation metrics are a key prerequisite for the reliable tracking of scientific progress and for deciding on the potential clinical translation of methods. While recent initiatives aim to develop comprehensive theoretical frameworks for understanding metric-related pitfalls in image analy...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10329574/ https://www.ncbi.nlm.nih.gov/pubmed/37266886 http://dx.doi.org/10.1007/s11548-023-02936-9 |
_version_ | 1785070047617613824 |
---|---|
author | Tran, T. N. Adler, T. J. Yamlahi, A. Christodoulou, E. Godau, P. Reinke, A. Tizabi, M. D. Sauer, P. Persicke, T. Albert, J. G. Maier-Hein, L. |
author_facet | Tran, T. N. Adler, T. J. Yamlahi, A. Christodoulou, E. Godau, P. Reinke, A. Tizabi, M. D. Sauer, P. Persicke, T. Albert, J. G. Maier-Hein, L. |
author_sort | Tran, T. N. |
collection | PubMed |
description | PURPOSE: Validation metrics are a key prerequisite for the reliable tracking of scientific progress and for deciding on the potential clinical translation of methods. While recent initiatives aim to develop comprehensive theoretical frameworks for understanding metric-related pitfalls in image analysis problems, there is a lack of experimental evidence on the concrete effects of common and rare pitfalls on specific applications. We address this gap in the literature in the context of colon cancer screening. METHODS: Our contribution is twofold. Firstly, we present the winning solution of the Endoscopy Computer Vision Challenge on colon cancer detection, conducted in conjunction with the IEEE International Symposium on Biomedical Imaging 2022. Secondly, we demonstrate the sensitivity of commonly used metrics to a range of hyperparameters as well as the consequences of poor metric choices. RESULTS: Based on comprehensive validation studies performed with patient data from six clinical centers, we found all commonly applied object detection metrics to be subject to high inter-center variability. Furthermore, our results clearly demonstrate that the adaptation of standard hyperparameters used in the computer vision community does not generally lead to the clinically most plausible results. Finally, we present localization criteria that correspond well to clinical relevance. CONCLUSION: We conclude from our study that (1) performance results in polyp detection are highly sensitive to various design choices, (2) common metric configurations do not reflect the clinical need and rely on suboptimal hyperparameters and (3) comparison of performance across datasets can be largely misleading. Our work could be a first step towards reconsidering common validation strategies in deep learning-based colonoscopy and beyond. |
format | Online Article Text |
id | pubmed-10329574 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-103295742023-07-10 Sources of performance variability in deep learning-based polyp detection Tran, T. N. Adler, T. J. Yamlahi, A. Christodoulou, E. Godau, P. Reinke, A. Tizabi, M. D. Sauer, P. Persicke, T. Albert, J. G. Maier-Hein, L. Int J Comput Assist Radiol Surg Original Article PURPOSE: Validation metrics are a key prerequisite for the reliable tracking of scientific progress and for deciding on the potential clinical translation of methods. While recent initiatives aim to develop comprehensive theoretical frameworks for understanding metric-related pitfalls in image analysis problems, there is a lack of experimental evidence on the concrete effects of common and rare pitfalls on specific applications. We address this gap in the literature in the context of colon cancer screening. METHODS: Our contribution is twofold. Firstly, we present the winning solution of the Endoscopy Computer Vision Challenge on colon cancer detection, conducted in conjunction with the IEEE International Symposium on Biomedical Imaging 2022. Secondly, we demonstrate the sensitivity of commonly used metrics to a range of hyperparameters as well as the consequences of poor metric choices. RESULTS: Based on comprehensive validation studies performed with patient data from six clinical centers, we found all commonly applied object detection metrics to be subject to high inter-center variability. Furthermore, our results clearly demonstrate that the adaptation of standard hyperparameters used in the computer vision community does not generally lead to the clinically most plausible results. Finally, we present localization criteria that correspond well to clinical relevance. CONCLUSION: We conclude from our study that (1) performance results in polyp detection are highly sensitive to various design choices, (2) common metric configurations do not reflect the clinical need and rely on suboptimal hyperparameters and (3) comparison of performance across datasets can be largely misleading. Our work could be a first step towards reconsidering common validation strategies in deep learning-based colonoscopy and beyond. Springer International Publishing 2023-06-02 2023 /pmc/articles/PMC10329574/ /pubmed/37266886 http://dx.doi.org/10.1007/s11548-023-02936-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Original Article Tran, T. N. Adler, T. J. Yamlahi, A. Christodoulou, E. Godau, P. Reinke, A. Tizabi, M. D. Sauer, P. Persicke, T. Albert, J. G. Maier-Hein, L. Sources of performance variability in deep learning-based polyp detection |
title | Sources of performance variability in deep learning-based polyp detection |
title_full | Sources of performance variability in deep learning-based polyp detection |
title_fullStr | Sources of performance variability in deep learning-based polyp detection |
title_full_unstemmed | Sources of performance variability in deep learning-based polyp detection |
title_short | Sources of performance variability in deep learning-based polyp detection |
title_sort | sources of performance variability in deep learning-based polyp detection |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10329574/ https://www.ncbi.nlm.nih.gov/pubmed/37266886 http://dx.doi.org/10.1007/s11548-023-02936-9 |
work_keys_str_mv | AT trantn sourcesofperformancevariabilityindeeplearningbasedpolypdetection AT adlertj sourcesofperformancevariabilityindeeplearningbasedpolypdetection AT yamlahia sourcesofperformancevariabilityindeeplearningbasedpolypdetection AT christodouloue sourcesofperformancevariabilityindeeplearningbasedpolypdetection AT godaup sourcesofperformancevariabilityindeeplearningbasedpolypdetection AT reinkea sourcesofperformancevariabilityindeeplearningbasedpolypdetection AT tizabimd sourcesofperformancevariabilityindeeplearningbasedpolypdetection AT sauerp sourcesofperformancevariabilityindeeplearningbasedpolypdetection AT persicket sourcesofperformancevariabilityindeeplearningbasedpolypdetection AT albertjg sourcesofperformancevariabilityindeeplearningbasedpolypdetection AT maierheinl sourcesofperformancevariabilityindeeplearningbasedpolypdetection |