Cargando…

Attention and feature transfer based knowledge distillation

Existing knowledge distillation (KD) methods are mainly based on features, logic, or attention, where features and logic represent the results of reasoning at different stages of a convolutional neural network, and attention maps symbolize the reasoning process. Because of the continuity of the two...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yang, Guoliang, Yu, Shuaiying, Sheng, Yangyang, Yang, Hao
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10603170/ https://www.ncbi.nlm.nih.gov/pubmed/37884556 http://dx.doi.org/10.1038/s41598-023-43986-y

_version_	1785126548062339072
author	Yang, Guoliang Yu, Shuaiying Sheng, Yangyang Yang, Hao
author_facet	Yang, Guoliang Yu, Shuaiying Sheng, Yangyang Yang, Hao
author_sort	Yang, Guoliang
collection	PubMed
description	Existing knowledge distillation (KD) methods are mainly based on features, logic, or attention, where features and logic represent the results of reasoning at different stages of a convolutional neural network, and attention maps symbolize the reasoning process. Because of the continuity of the two in time, transferring only one of them to the student network will lead to unsatisfactory results. We study the knowledge transfer between the teacher-student network to different degrees, revealing the importance of simultaneously transferring knowledge related to the reasoning process and reasoning results to the student network, providing a new perspective for the study of KD. On this basis, we proposed the knowledge distillation method based on attention and feature transfer (AFT-KD). First, we use transformation structures to transform intermediate features into attentional and feature block (AFB) that contain both inference process information and inference outcome information, and force students to learn the knowledge in AFBs. To save computation in the learning process, we use block operations to align the teacher-student network. In addition, in order to balance the attenuation ratio between different losses, we design an adaptive loss function based on the loss optimization rate. Experiments have shown that AFT-KD achieves state-of-the-art performance in multiple benchmark tests.
format	Online Article Text
id	pubmed-10603170
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-106031702023-10-28 Attention and feature transfer based knowledge distillation Yang, Guoliang Yu, Shuaiying Sheng, Yangyang Yang, Hao Sci Rep Article Existing knowledge distillation (KD) methods are mainly based on features, logic, or attention, where features and logic represent the results of reasoning at different stages of a convolutional neural network, and attention maps symbolize the reasoning process. Because of the continuity of the two in time, transferring only one of them to the student network will lead to unsatisfactory results. We study the knowledge transfer between the teacher-student network to different degrees, revealing the importance of simultaneously transferring knowledge related to the reasoning process and reasoning results to the student network, providing a new perspective for the study of KD. On this basis, we proposed the knowledge distillation method based on attention and feature transfer (AFT-KD). First, we use transformation structures to transform intermediate features into attentional and feature block (AFB) that contain both inference process information and inference outcome information, and force students to learn the knowledge in AFBs. To save computation in the learning process, we use block operations to align the teacher-student network. In addition, in order to balance the attenuation ratio between different losses, we design an adaptive loss function based on the loss optimization rate. Experiments have shown that AFT-KD achieves state-of-the-art performance in multiple benchmark tests. Nature Publishing Group UK 2023-10-26 /pmc/articles/PMC10603170/ /pubmed/37884556 http://dx.doi.org/10.1038/s41598-023-43986-y Text en © The Author(s) 2023, corrected publication 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Yang, Guoliang Yu, Shuaiying Sheng, Yangyang Yang, Hao Attention and feature transfer based knowledge distillation
title	Attention and feature transfer based knowledge distillation
title_full	Attention and feature transfer based knowledge distillation
title_fullStr	Attention and feature transfer based knowledge distillation
title_full_unstemmed	Attention and feature transfer based knowledge distillation
title_short	Attention and feature transfer based knowledge distillation
title_sort	attention and feature transfer based knowledge distillation
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10603170/ https://www.ncbi.nlm.nih.gov/pubmed/37884556 http://dx.doi.org/10.1038/s41598-023-43986-y
work_keys_str_mv	AT yangguoliang attentionandfeaturetransferbasedknowledgedistillation AT yushuaiying attentionandfeaturetransferbasedknowledgedistillation AT shengyangyang attentionandfeaturetransferbasedknowledgedistillation AT yanghao attentionandfeaturetransferbasedknowledgedistillation

Attention and feature transfer based knowledge distillation

Ejemplares similares