Cargando…
Knowledge distillation based on multi-layer fusion features
Knowledge distillation improves the performance of a small student network by promoting it to learn the knowledge from a pre-trained high-performance but bulky teacher network. Generally, most of the current knowledge distillation methods extract relatively simple features from the middle or bottom...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10461825/ https://www.ncbi.nlm.nih.gov/pubmed/37639443 http://dx.doi.org/10.1371/journal.pone.0285901 |
_version_ | 1785097917736943616 |
---|---|
author | Tan, Shengyuan Guo, Rongzuo Tang, Jialiang Jiang, Ning Zou, Junying |
author_facet | Tan, Shengyuan Guo, Rongzuo Tang, Jialiang Jiang, Ning Zou, Junying |
author_sort | Tan, Shengyuan |
collection | PubMed |
description | Knowledge distillation improves the performance of a small student network by promoting it to learn the knowledge from a pre-trained high-performance but bulky teacher network. Generally, most of the current knowledge distillation methods extract relatively simple features from the middle or bottom layer of teacher network for knowledge transmission. However, the above methods ignore the fusion of features, and the fused features contain richer information. We believe that the richer and better information contained in the knowledge delivered by teachers to students, the easier it is for students to perform better. In this paper, we propose a new method called Multi-feature Fusion Knowledge Distillation (MFKD) to extract and utilize the expressive fusion features of teacher network. Specifically, we extract feature maps from different positions of the network, i.e., the middle layer, the bottom layer, and even the front layer of the network. To properly utilize these features, this method designs a multi-feature fusion scheme to integrate them together. Compared to features extracted from single location of teacher network, the final fusion feature map contains meaningful information. Extensive experiments on image classification tasks demonstrate that the student network trained by our MFKD can learn from the fusion features, leading to superior performance. The results show that MFKD can improve the Top-1 accuracy of ResNet20 and VGG8 by 1.82% and 3.35% respectively on the CIFAR-100 dataset, which is better than state-of-the-art many existing methods. |
format | Online Article Text |
id | pubmed-10461825 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-104618252023-08-29 Knowledge distillation based on multi-layer fusion features Tan, Shengyuan Guo, Rongzuo Tang, Jialiang Jiang, Ning Zou, Junying PLoS One Research Article Knowledge distillation improves the performance of a small student network by promoting it to learn the knowledge from a pre-trained high-performance but bulky teacher network. Generally, most of the current knowledge distillation methods extract relatively simple features from the middle or bottom layer of teacher network for knowledge transmission. However, the above methods ignore the fusion of features, and the fused features contain richer information. We believe that the richer and better information contained in the knowledge delivered by teachers to students, the easier it is for students to perform better. In this paper, we propose a new method called Multi-feature Fusion Knowledge Distillation (MFKD) to extract and utilize the expressive fusion features of teacher network. Specifically, we extract feature maps from different positions of the network, i.e., the middle layer, the bottom layer, and even the front layer of the network. To properly utilize these features, this method designs a multi-feature fusion scheme to integrate them together. Compared to features extracted from single location of teacher network, the final fusion feature map contains meaningful information. Extensive experiments on image classification tasks demonstrate that the student network trained by our MFKD can learn from the fusion features, leading to superior performance. The results show that MFKD can improve the Top-1 accuracy of ResNet20 and VGG8 by 1.82% and 3.35% respectively on the CIFAR-100 dataset, which is better than state-of-the-art many existing methods. Public Library of Science 2023-08-28 /pmc/articles/PMC10461825/ /pubmed/37639443 http://dx.doi.org/10.1371/journal.pone.0285901 Text en © 2023 Tan et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Tan, Shengyuan Guo, Rongzuo Tang, Jialiang Jiang, Ning Zou, Junying Knowledge distillation based on multi-layer fusion features |
title | Knowledge distillation based on multi-layer fusion features |
title_full | Knowledge distillation based on multi-layer fusion features |
title_fullStr | Knowledge distillation based on multi-layer fusion features |
title_full_unstemmed | Knowledge distillation based on multi-layer fusion features |
title_short | Knowledge distillation based on multi-layer fusion features |
title_sort | knowledge distillation based on multi-layer fusion features |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10461825/ https://www.ncbi.nlm.nih.gov/pubmed/37639443 http://dx.doi.org/10.1371/journal.pone.0285901 |
work_keys_str_mv | AT tanshengyuan knowledgedistillationbasedonmultilayerfusionfeatures AT guorongzuo knowledgedistillationbasedonmultilayerfusionfeatures AT tangjialiang knowledgedistillationbasedonmultilayerfusionfeatures AT jiangning knowledgedistillationbasedonmultilayerfusionfeatures AT zoujunying knowledgedistillationbasedonmultilayerfusionfeatures |