Cargando…
Attention and feature transfer based knowledge distillation
Existing knowledge distillation (KD) methods are mainly based on features, logic, or attention, where features and logic represent the results of reasoning at different stages of a convolutional neural network, and attention maps symbolize the reasoning process. Because of the continuity of the two...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10603170/ https://www.ncbi.nlm.nih.gov/pubmed/37884556 http://dx.doi.org/10.1038/s41598-023-43986-y |
_version_ | 1785126548062339072 |
---|---|
author | Yang, Guoliang Yu, Shuaiying Sheng, Yangyang Yang, Hao |
author_facet | Yang, Guoliang Yu, Shuaiying Sheng, Yangyang Yang, Hao |
author_sort | Yang, Guoliang |
collection | PubMed |
description | Existing knowledge distillation (KD) methods are mainly based on features, logic, or attention, where features and logic represent the results of reasoning at different stages of a convolutional neural network, and attention maps symbolize the reasoning process. Because of the continuity of the two in time, transferring only one of them to the student network will lead to unsatisfactory results. We study the knowledge transfer between the teacher-student network to different degrees, revealing the importance of simultaneously transferring knowledge related to the reasoning process and reasoning results to the student network, providing a new perspective for the study of KD. On this basis, we proposed the knowledge distillation method based on attention and feature transfer (AFT-KD). First, we use transformation structures to transform intermediate features into attentional and feature block (AFB) that contain both inference process information and inference outcome information, and force students to learn the knowledge in AFBs. To save computation in the learning process, we use block operations to align the teacher-student network. In addition, in order to balance the attenuation ratio between different losses, we design an adaptive loss function based on the loss optimization rate. Experiments have shown that AFT-KD achieves state-of-the-art performance in multiple benchmark tests. |
format | Online Article Text |
id | pubmed-10603170 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-106031702023-10-28 Attention and feature transfer based knowledge distillation Yang, Guoliang Yu, Shuaiying Sheng, Yangyang Yang, Hao Sci Rep Article Existing knowledge distillation (KD) methods are mainly based on features, logic, or attention, where features and logic represent the results of reasoning at different stages of a convolutional neural network, and attention maps symbolize the reasoning process. Because of the continuity of the two in time, transferring only one of them to the student network will lead to unsatisfactory results. We study the knowledge transfer between the teacher-student network to different degrees, revealing the importance of simultaneously transferring knowledge related to the reasoning process and reasoning results to the student network, providing a new perspective for the study of KD. On this basis, we proposed the knowledge distillation method based on attention and feature transfer (AFT-KD). First, we use transformation structures to transform intermediate features into attentional and feature block (AFB) that contain both inference process information and inference outcome information, and force students to learn the knowledge in AFBs. To save computation in the learning process, we use block operations to align the teacher-student network. In addition, in order to balance the attenuation ratio between different losses, we design an adaptive loss function based on the loss optimization rate. Experiments have shown that AFT-KD achieves state-of-the-art performance in multiple benchmark tests. Nature Publishing Group UK 2023-10-26 /pmc/articles/PMC10603170/ /pubmed/37884556 http://dx.doi.org/10.1038/s41598-023-43986-y Text en © The Author(s) 2023, corrected publication 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Yang, Guoliang Yu, Shuaiying Sheng, Yangyang Yang, Hao Attention and feature transfer based knowledge distillation |
title | Attention and feature transfer based knowledge distillation |
title_full | Attention and feature transfer based knowledge distillation |
title_fullStr | Attention and feature transfer based knowledge distillation |
title_full_unstemmed | Attention and feature transfer based knowledge distillation |
title_short | Attention and feature transfer based knowledge distillation |
title_sort | attention and feature transfer based knowledge distillation |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10603170/ https://www.ncbi.nlm.nih.gov/pubmed/37884556 http://dx.doi.org/10.1038/s41598-023-43986-y |
work_keys_str_mv | AT yangguoliang attentionandfeaturetransferbasedknowledgedistillation AT yushuaiying attentionandfeaturetransferbasedknowledgedistillation AT shengyangyang attentionandfeaturetransferbasedknowledgedistillation AT yanghao attentionandfeaturetransferbasedknowledgedistillation |