Cargando…

Loss-Based Attention for Interpreting Image-Level Prediction of Convolutional Neural Networks

Although deep neural networks have achieved great success on numerous large-scale tasks, poor interpretability is still a notorious obstacle for practical applications. In this paper, we propose a novel and general attention mechanism, loss-based attention, upon which we modify deep neural networks...

Descripción completa

Detalles Bibliográficos
Autores principales: Shi, Xiaoshuang, Xing, Fuyong, Xu, Kaidi, Chen, Pingjun, Liang, Yun, Lu, Zhiyong, Guo, Zhenhua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9531187/
https://www.ncbi.nlm.nih.gov/pubmed/33382655
http://dx.doi.org/10.1109/TIP.2020.3046875
_version_ 1784801848499109888
author Shi, Xiaoshuang
Xing, Fuyong
Xu, Kaidi
Chen, Pingjun
Liang, Yun
Lu, Zhiyong
Guo, Zhenhua
author_facet Shi, Xiaoshuang
Xing, Fuyong
Xu, Kaidi
Chen, Pingjun
Liang, Yun
Lu, Zhiyong
Guo, Zhenhua
author_sort Shi, Xiaoshuang
collection PubMed
description Although deep neural networks have achieved great success on numerous large-scale tasks, poor interpretability is still a notorious obstacle for practical applications. In this paper, we propose a novel and general attention mechanism, loss-based attention, upon which we modify deep neural networks to mine significant image patches for explaining which parts determine the image decision-making. This is inspired by the fact that some patches contain significant objects or their parts for image-level decision. Unlike previous attention mechanisms that adopt different layers and parameters to learn weights and image prediction, the proposed loss-based attention mechanism mines significant patches by utilizing the same parameters to learn patch weights and logits (class vectors), and image prediction simultaneously, so as to connect the attention mechanism with the loss function for boosting the patch precision and recall. Additionally, different from previous popular networks that utilize max-pooling or stride operations in convolutional layers without considering the spatial relationship of features, the modified deep architectures first remove them to preserve the spatial relationship of image patches and greatly reduce their dependencies, and then add two convolutional or capsule layers to extract their features. With the learned patch weights, the image-level decision of the modified deep architectures is the weighted sum on patches. Extensive experiments on large-scale benchmark databases demonstrate that the proposed architectures can obtain better or competitive performance to state-of-the-art baseline networks with better interpretability.
format Online
Article
Text
id pubmed-9531187
institution National Center for Biotechnology Information
language English
publishDate 2021
record_format MEDLINE/PubMed
spelling pubmed-95311872022-10-04 Loss-Based Attention for Interpreting Image-Level Prediction of Convolutional Neural Networks Shi, Xiaoshuang Xing, Fuyong Xu, Kaidi Chen, Pingjun Liang, Yun Lu, Zhiyong Guo, Zhenhua IEEE Trans Image Process Article Although deep neural networks have achieved great success on numerous large-scale tasks, poor interpretability is still a notorious obstacle for practical applications. In this paper, we propose a novel and general attention mechanism, loss-based attention, upon which we modify deep neural networks to mine significant image patches for explaining which parts determine the image decision-making. This is inspired by the fact that some patches contain significant objects or their parts for image-level decision. Unlike previous attention mechanisms that adopt different layers and parameters to learn weights and image prediction, the proposed loss-based attention mechanism mines significant patches by utilizing the same parameters to learn patch weights and logits (class vectors), and image prediction simultaneously, so as to connect the attention mechanism with the loss function for boosting the patch precision and recall. Additionally, different from previous popular networks that utilize max-pooling or stride operations in convolutional layers without considering the spatial relationship of features, the modified deep architectures first remove them to preserve the spatial relationship of image patches and greatly reduce their dependencies, and then add two convolutional or capsule layers to extract their features. With the learned patch weights, the image-level decision of the modified deep architectures is the weighted sum on patches. Extensive experiments on large-scale benchmark databases demonstrate that the proposed architectures can obtain better or competitive performance to state-of-the-art baseline networks with better interpretability. 2021 2021-01-11 /pmc/articles/PMC9531187/ /pubmed/33382655 http://dx.doi.org/10.1109/TIP.2020.3046875 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Shi, Xiaoshuang
Xing, Fuyong
Xu, Kaidi
Chen, Pingjun
Liang, Yun
Lu, Zhiyong
Guo, Zhenhua
Loss-Based Attention for Interpreting Image-Level Prediction of Convolutional Neural Networks
title Loss-Based Attention for Interpreting Image-Level Prediction of Convolutional Neural Networks
title_full Loss-Based Attention for Interpreting Image-Level Prediction of Convolutional Neural Networks
title_fullStr Loss-Based Attention for Interpreting Image-Level Prediction of Convolutional Neural Networks
title_full_unstemmed Loss-Based Attention for Interpreting Image-Level Prediction of Convolutional Neural Networks
title_short Loss-Based Attention for Interpreting Image-Level Prediction of Convolutional Neural Networks
title_sort loss-based attention for interpreting image-level prediction of convolutional neural networks
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9531187/
https://www.ncbi.nlm.nih.gov/pubmed/33382655
http://dx.doi.org/10.1109/TIP.2020.3046875
work_keys_str_mv AT shixiaoshuang lossbasedattentionforinterpretingimagelevelpredictionofconvolutionalneuralnetworks
AT xingfuyong lossbasedattentionforinterpretingimagelevelpredictionofconvolutionalneuralnetworks
AT xukaidi lossbasedattentionforinterpretingimagelevelpredictionofconvolutionalneuralnetworks
AT chenpingjun lossbasedattentionforinterpretingimagelevelpredictionofconvolutionalneuralnetworks
AT liangyun lossbasedattentionforinterpretingimagelevelpredictionofconvolutionalneuralnetworks
AT luzhiyong lossbasedattentionforinterpretingimagelevelpredictionofconvolutionalneuralnetworks
AT guozhenhua lossbasedattentionforinterpretingimagelevelpredictionofconvolutionalneuralnetworks