Cargando…

Systematic Comparison of Heatmapping Techniques in Deep Learning in the Context of Diabetic Retinopathy Lesion Detection

PURPOSE: Heatmapping techniques can support explainability of deep learning (DL) predictions in medical image analysis. However, individual techniques have been mainly applied in a descriptive way without an objective and systematic evaluation. We investigated comparative performances using diabetic...

Descripción completa

Detalles Bibliográficos
Autores principales: Van Craenendonck, Toon, Elen, Bart, Gerrits, Nele, De Boever, Patrick
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Association for Research in Vision and Ophthalmology 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7774113/
https://www.ncbi.nlm.nih.gov/pubmed/33403156
http://dx.doi.org/10.1167/tvst.9.2.64
Descripción
Sumario:PURPOSE: Heatmapping techniques can support explainability of deep learning (DL) predictions in medical image analysis. However, individual techniques have been mainly applied in a descriptive way without an objective and systematic evaluation. We investigated comparative performances using diabetic retinopathy lesion detection as a benchmark task. METHODS: The Indian Diabetic Retinopathy Image Dataset (IDRiD) publicly available database contains fundus images of diabetes patients with pixel level annotations of diabetic retinopathy (DR) lesions, the ground truth for this study. Three in advance trained DL models (ResNet50, VGG16 or InceptionV3) were used for DR detection in these images. Next, explainability was visualized with each of the 10 most used heatmapping techniques. The quantitative correspondence between the output of a heatmap and the ground truth was evaluated with the Explainability Consistency Score (ECS), a metric between 0 and 1, developed for this comparative task. RESULTS: In case of the overall DR lesions detection, the ECS ranged from 0.21 to 0.51 for all model/heatmapping combinations. The highest score was for VGG16+Grad-CAM (ECS = 0.51; 95% confidence interval [CI]: [0.46; 0.55]). For individual lesions, VGG16+Grad-CAM performed best on hemorrhages and hard exudates. ResNet50+SmoothGrad performed best for soft exudates and ResNet50+Guided Backpropagation performed best for microaneurysms. CONCLUSIONS: Our empirical evaluation on the IDRiD database demonstrated that the combination DL model/heatmapping affects explainability when considering common DR lesions. Our approach found considerable disagreement between regions highlighted by heatmaps and expert annotations. TRANSLATIONAL RELEVANCE: We warrant a more systematic investigation and analysis of heatmaps for reliable explanation of image-based predictions of deep learning models.