Cargando…

Sources of performance variability in deep learning-based polyp detection

PURPOSE: Validation metrics are a key prerequisite for the reliable tracking of scientific progress and for deciding on the potential clinical translation of methods. While recent initiatives aim to develop comprehensive theoretical frameworks for understanding metric-related pitfalls in image analy...

Descripción completa

Detalles Bibliográficos
Autores principales: Tran, T. N., Adler, T. J., Yamlahi, A., Christodoulou, E., Godau, P., Reinke, A., Tizabi, M. D., Sauer, P., Persicke, T., Albert, J. G., Maier-Hein, L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10329574/
https://www.ncbi.nlm.nih.gov/pubmed/37266886
http://dx.doi.org/10.1007/s11548-023-02936-9
_version_ 1785070047617613824
author Tran, T. N.
Adler, T. J.
Yamlahi, A.
Christodoulou, E.
Godau, P.
Reinke, A.
Tizabi, M. D.
Sauer, P.
Persicke, T.
Albert, J. G.
Maier-Hein, L.
author_facet Tran, T. N.
Adler, T. J.
Yamlahi, A.
Christodoulou, E.
Godau, P.
Reinke, A.
Tizabi, M. D.
Sauer, P.
Persicke, T.
Albert, J. G.
Maier-Hein, L.
author_sort Tran, T. N.
collection PubMed
description PURPOSE: Validation metrics are a key prerequisite for the reliable tracking of scientific progress and for deciding on the potential clinical translation of methods. While recent initiatives aim to develop comprehensive theoretical frameworks for understanding metric-related pitfalls in image analysis problems, there is a lack of experimental evidence on the concrete effects of common and rare pitfalls on specific applications. We address this gap in the literature in the context of colon cancer screening. METHODS: Our contribution is twofold. Firstly, we present the winning solution of the Endoscopy Computer Vision Challenge on colon cancer detection, conducted in conjunction with the IEEE International Symposium on Biomedical Imaging 2022. Secondly, we demonstrate the sensitivity of commonly used metrics to a range of hyperparameters as well as the consequences of poor metric choices. RESULTS: Based on comprehensive validation studies performed with patient data from six clinical centers, we found all commonly applied object detection metrics to be subject to high inter-center variability. Furthermore, our results clearly demonstrate that the adaptation of standard hyperparameters used in the computer vision community does not generally lead to the clinically most plausible results. Finally, we present localization criteria that correspond well to clinical relevance. CONCLUSION: We conclude from our study that (1) performance results in polyp detection are highly sensitive to various design choices, (2) common metric configurations do not reflect the clinical need and rely on suboptimal hyperparameters and (3) comparison of performance across datasets can be largely misleading. Our work could be a first step towards reconsidering common validation strategies in deep learning-based colonoscopy and beyond.
format Online
Article
Text
id pubmed-10329574
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-103295742023-07-10 Sources of performance variability in deep learning-based polyp detection Tran, T. N. Adler, T. J. Yamlahi, A. Christodoulou, E. Godau, P. Reinke, A. Tizabi, M. D. Sauer, P. Persicke, T. Albert, J. G. Maier-Hein, L. Int J Comput Assist Radiol Surg Original Article PURPOSE: Validation metrics are a key prerequisite for the reliable tracking of scientific progress and for deciding on the potential clinical translation of methods. While recent initiatives aim to develop comprehensive theoretical frameworks for understanding metric-related pitfalls in image analysis problems, there is a lack of experimental evidence on the concrete effects of common and rare pitfalls on specific applications. We address this gap in the literature in the context of colon cancer screening. METHODS: Our contribution is twofold. Firstly, we present the winning solution of the Endoscopy Computer Vision Challenge on colon cancer detection, conducted in conjunction with the IEEE International Symposium on Biomedical Imaging 2022. Secondly, we demonstrate the sensitivity of commonly used metrics to a range of hyperparameters as well as the consequences of poor metric choices. RESULTS: Based on comprehensive validation studies performed with patient data from six clinical centers, we found all commonly applied object detection metrics to be subject to high inter-center variability. Furthermore, our results clearly demonstrate that the adaptation of standard hyperparameters used in the computer vision community does not generally lead to the clinically most plausible results. Finally, we present localization criteria that correspond well to clinical relevance. CONCLUSION: We conclude from our study that (1) performance results in polyp detection are highly sensitive to various design choices, (2) common metric configurations do not reflect the clinical need and rely on suboptimal hyperparameters and (3) comparison of performance across datasets can be largely misleading. Our work could be a first step towards reconsidering common validation strategies in deep learning-based colonoscopy and beyond. Springer International Publishing 2023-06-02 2023 /pmc/articles/PMC10329574/ /pubmed/37266886 http://dx.doi.org/10.1007/s11548-023-02936-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Original Article
Tran, T. N.
Adler, T. J.
Yamlahi, A.
Christodoulou, E.
Godau, P.
Reinke, A.
Tizabi, M. D.
Sauer, P.
Persicke, T.
Albert, J. G.
Maier-Hein, L.
Sources of performance variability in deep learning-based polyp detection
title Sources of performance variability in deep learning-based polyp detection
title_full Sources of performance variability in deep learning-based polyp detection
title_fullStr Sources of performance variability in deep learning-based polyp detection
title_full_unstemmed Sources of performance variability in deep learning-based polyp detection
title_short Sources of performance variability in deep learning-based polyp detection
title_sort sources of performance variability in deep learning-based polyp detection
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10329574/
https://www.ncbi.nlm.nih.gov/pubmed/37266886
http://dx.doi.org/10.1007/s11548-023-02936-9
work_keys_str_mv AT trantn sourcesofperformancevariabilityindeeplearningbasedpolypdetection
AT adlertj sourcesofperformancevariabilityindeeplearningbasedpolypdetection
AT yamlahia sourcesofperformancevariabilityindeeplearningbasedpolypdetection
AT christodouloue sourcesofperformancevariabilityindeeplearningbasedpolypdetection
AT godaup sourcesofperformancevariabilityindeeplearningbasedpolypdetection
AT reinkea sourcesofperformancevariabilityindeeplearningbasedpolypdetection
AT tizabimd sourcesofperformancevariabilityindeeplearningbasedpolypdetection
AT sauerp sourcesofperformancevariabilityindeeplearningbasedpolypdetection
AT persicket sourcesofperformancevariabilityindeeplearningbasedpolypdetection
AT albertjg sourcesofperformancevariabilityindeeplearningbasedpolypdetection
AT maierheinl sourcesofperformancevariabilityindeeplearningbasedpolypdetection