Cargando…

Attention and feature transfer based knowledge distillation

Existing knowledge distillation (KD) methods are mainly based on features, logic, or attention, where features and logic represent the results of reasoning at different stages of a convolutional neural network, and attention maps symbolize the reasoning process. Because of the continuity of the two...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Guoliang, Yu, Shuaiying, Sheng, Yangyang, Yang, Hao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10603170/
https://www.ncbi.nlm.nih.gov/pubmed/37884556
http://dx.doi.org/10.1038/s41598-023-43986-y
_version_ 1785126548062339072
author Yang, Guoliang
Yu, Shuaiying
Sheng, Yangyang
Yang, Hao
author_facet Yang, Guoliang
Yu, Shuaiying
Sheng, Yangyang
Yang, Hao
author_sort Yang, Guoliang
collection PubMed
description Existing knowledge distillation (KD) methods are mainly based on features, logic, or attention, where features and logic represent the results of reasoning at different stages of a convolutional neural network, and attention maps symbolize the reasoning process. Because of the continuity of the two in time, transferring only one of them to the student network will lead to unsatisfactory results. We study the knowledge transfer between the teacher-student network to different degrees, revealing the importance of simultaneously transferring knowledge related to the reasoning process and reasoning results to the student network, providing a new perspective for the study of KD. On this basis, we proposed the knowledge distillation method based on attention and feature transfer (AFT-KD). First, we use transformation structures to transform intermediate features into attentional and feature block (AFB) that contain both inference process information and inference outcome information, and force students to learn the knowledge in AFBs. To save computation in the learning process, we use block operations to align the teacher-student network. In addition, in order to balance the attenuation ratio between different losses, we design an adaptive loss function based on the loss optimization rate. Experiments have shown that AFT-KD achieves state-of-the-art performance in multiple benchmark tests.
format Online
Article
Text
id pubmed-10603170
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-106031702023-10-28 Attention and feature transfer based knowledge distillation Yang, Guoliang Yu, Shuaiying Sheng, Yangyang Yang, Hao Sci Rep Article Existing knowledge distillation (KD) methods are mainly based on features, logic, or attention, where features and logic represent the results of reasoning at different stages of a convolutional neural network, and attention maps symbolize the reasoning process. Because of the continuity of the two in time, transferring only one of them to the student network will lead to unsatisfactory results. We study the knowledge transfer between the teacher-student network to different degrees, revealing the importance of simultaneously transferring knowledge related to the reasoning process and reasoning results to the student network, providing a new perspective for the study of KD. On this basis, we proposed the knowledge distillation method based on attention and feature transfer (AFT-KD). First, we use transformation structures to transform intermediate features into attentional and feature block (AFB) that contain both inference process information and inference outcome information, and force students to learn the knowledge in AFBs. To save computation in the learning process, we use block operations to align the teacher-student network. In addition, in order to balance the attenuation ratio between different losses, we design an adaptive loss function based on the loss optimization rate. Experiments have shown that AFT-KD achieves state-of-the-art performance in multiple benchmark tests. Nature Publishing Group UK 2023-10-26 /pmc/articles/PMC10603170/ /pubmed/37884556 http://dx.doi.org/10.1038/s41598-023-43986-y Text en © The Author(s) 2023, corrected publication 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Yang, Guoliang
Yu, Shuaiying
Sheng, Yangyang
Yang, Hao
Attention and feature transfer based knowledge distillation
title Attention and feature transfer based knowledge distillation
title_full Attention and feature transfer based knowledge distillation
title_fullStr Attention and feature transfer based knowledge distillation
title_full_unstemmed Attention and feature transfer based knowledge distillation
title_short Attention and feature transfer based knowledge distillation
title_sort attention and feature transfer based knowledge distillation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10603170/
https://www.ncbi.nlm.nih.gov/pubmed/37884556
http://dx.doi.org/10.1038/s41598-023-43986-y
work_keys_str_mv AT yangguoliang attentionandfeaturetransferbasedknowledgedistillation
AT yushuaiying attentionandfeaturetransferbasedknowledgedistillation
AT shengyangyang attentionandfeaturetransferbasedknowledgedistillation
AT yanghao attentionandfeaturetransferbasedknowledgedistillation