Cargando…

Knowledge Fusion Distillation: Improving Distillation with Multi-scale Attention Mechanisms

The success of deep learning has brought breakthroughs in many fields. However, the increased performance of deep learning models is often accompanied by an increase in their depth and width, which conflicts with the storage, energy consumption, and computational power of edge devices. Knowledge dis...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Linfeng, Su, Weixing, Liu, Fang, He, Maowei, Liang, Xiaodan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9807430/
https://www.ncbi.nlm.nih.gov/pubmed/36619739
http://dx.doi.org/10.1007/s11063-022-11132-w
_version_ 1784862717033578496
author Li, Linfeng
Su, Weixing
Liu, Fang
He, Maowei
Liang, Xiaodan
author_facet Li, Linfeng
Su, Weixing
Liu, Fang
He, Maowei
Liang, Xiaodan
author_sort Li, Linfeng
collection PubMed
description The success of deep learning has brought breakthroughs in many fields. However, the increased performance of deep learning models is often accompanied by an increase in their depth and width, which conflicts with the storage, energy consumption, and computational power of edge devices. Knowledge distillation, as an effective model compression method, can transfer knowledge from complex teacher models to student models. Self-distillation is a special type of knowledge distillation, which does not to require a pre-trained teacher model. However, existing self-distillation methods rarely consider how to effectively use the early features of the model. Furthermore, most self-distillation methods use features from the deepest layers of the network to guide the training of the branches of the network, which we find is not the optimal choice. In this paper, we found that the feature maps obtained by early feature fusion do not serve as a good teacher to guide their own training. Based on this, we propose a selective feature fusion module and further obtain a new self-distillation method, knowledge fusion distillation. Extensive experiments on three datasets have demonstrated that our method has comparable performance to state-of-the-art distillation methods. In addition, the performance of the network can be further enhanced when fused features are integrated into the network.
format Online
Article
Text
id pubmed-9807430
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-98074302023-01-04 Knowledge Fusion Distillation: Improving Distillation with Multi-scale Attention Mechanisms Li, Linfeng Su, Weixing Liu, Fang He, Maowei Liang, Xiaodan Neural Process Lett Article The success of deep learning has brought breakthroughs in many fields. However, the increased performance of deep learning models is often accompanied by an increase in their depth and width, which conflicts with the storage, energy consumption, and computational power of edge devices. Knowledge distillation, as an effective model compression method, can transfer knowledge from complex teacher models to student models. Self-distillation is a special type of knowledge distillation, which does not to require a pre-trained teacher model. However, existing self-distillation methods rarely consider how to effectively use the early features of the model. Furthermore, most self-distillation methods use features from the deepest layers of the network to guide the training of the branches of the network, which we find is not the optimal choice. In this paper, we found that the feature maps obtained by early feature fusion do not serve as a good teacher to guide their own training. Based on this, we propose a selective feature fusion module and further obtain a new self-distillation method, knowledge fusion distillation. Extensive experiments on three datasets have demonstrated that our method has comparable performance to state-of-the-art distillation methods. In addition, the performance of the network can be further enhanced when fused features are integrated into the network. Springer US 2023-01-03 /pmc/articles/PMC9807430/ /pubmed/36619739 http://dx.doi.org/10.1007/s11063-022-11132-w Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Li, Linfeng
Su, Weixing
Liu, Fang
He, Maowei
Liang, Xiaodan
Knowledge Fusion Distillation: Improving Distillation with Multi-scale Attention Mechanisms
title Knowledge Fusion Distillation: Improving Distillation with Multi-scale Attention Mechanisms
title_full Knowledge Fusion Distillation: Improving Distillation with Multi-scale Attention Mechanisms
title_fullStr Knowledge Fusion Distillation: Improving Distillation with Multi-scale Attention Mechanisms
title_full_unstemmed Knowledge Fusion Distillation: Improving Distillation with Multi-scale Attention Mechanisms
title_short Knowledge Fusion Distillation: Improving Distillation with Multi-scale Attention Mechanisms
title_sort knowledge fusion distillation: improving distillation with multi-scale attention mechanisms
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9807430/
https://www.ncbi.nlm.nih.gov/pubmed/36619739
http://dx.doi.org/10.1007/s11063-022-11132-w
work_keys_str_mv AT lilinfeng knowledgefusiondistillationimprovingdistillationwithmultiscaleattentionmechanisms
AT suweixing knowledgefusiondistillationimprovingdistillationwithmultiscaleattentionmechanisms
AT liufang knowledgefusiondistillationimprovingdistillationwithmultiscaleattentionmechanisms
AT hemaowei knowledgefusiondistillationimprovingdistillationwithmultiscaleattentionmechanisms
AT liangxiaodan knowledgefusiondistillationimprovingdistillationwithmultiscaleattentionmechanisms