Cargando…

Multistructure-Based Collaborative Online Distillation

Recently, deep learning has achieved state-of-the-art performance in more aspects than traditional shallow architecture-based machine-learning methods. However, in order to achieve higher accuracy, it is usually necessary to extend the network depth or ensemble the results of different neural networ...

Descripción completa

Detalles Bibliográficos
Autores principales: Gao, Liang, Lan, Xu, Mi, Haibo, Feng, Dawei, Xu, Kele, Peng, Yuxing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514841/
https://www.ncbi.nlm.nih.gov/pubmed/33267071
http://dx.doi.org/10.3390/e21040357
_version_ 1783586681736134656
author Gao, Liang
Lan, Xu
Mi, Haibo
Feng, Dawei
Xu, Kele
Peng, Yuxing
author_facet Gao, Liang
Lan, Xu
Mi, Haibo
Feng, Dawei
Xu, Kele
Peng, Yuxing
author_sort Gao, Liang
collection PubMed
description Recently, deep learning has achieved state-of-the-art performance in more aspects than traditional shallow architecture-based machine-learning methods. However, in order to achieve higher accuracy, it is usually necessary to extend the network depth or ensemble the results of different neural networks. Increasing network depth or ensembling different networks increases the demand for memory resources and computing resources. This leads to difficulties in deploying depth-learning models in resource-constrained scenarios such as drones, mobile phones, and autonomous driving. Improving network performance without expanding the network scale has become a hot topic for research. In this paper, we propose a cross-architecture online-distillation approach to solve this problem by transmitting supplementary information on different networks. We use the ensemble method to aggregate networks of different structures, thus forming better teachers than traditional distillation methods. In addition, discontinuous distillation with progressively enhanced constraints is used to replace fixed distillation in order to reduce loss of information diversity in the distillation process. Our training method improves the distillation effect and achieves strong network-performance improvement. We used some popular models to validate the results. On the CIFAR100 dataset, AlexNet’s accuracy was improved by 5.94%, VGG by 2.88%, ResNet by 5.07%, and DenseNet by 1.28%. Extensive experiments were conducted to demonstrate the effectiveness of the proposed method. On the CIFAR10, CIFAR100, and ImageNet datasets, we observed significant improvements over traditional knowledge distillation.
format Online
Article
Text
id pubmed-7514841
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75148412020-11-09 Multistructure-Based Collaborative Online Distillation Gao, Liang Lan, Xu Mi, Haibo Feng, Dawei Xu, Kele Peng, Yuxing Entropy (Basel) Article Recently, deep learning has achieved state-of-the-art performance in more aspects than traditional shallow architecture-based machine-learning methods. However, in order to achieve higher accuracy, it is usually necessary to extend the network depth or ensemble the results of different neural networks. Increasing network depth or ensembling different networks increases the demand for memory resources and computing resources. This leads to difficulties in deploying depth-learning models in resource-constrained scenarios such as drones, mobile phones, and autonomous driving. Improving network performance without expanding the network scale has become a hot topic for research. In this paper, we propose a cross-architecture online-distillation approach to solve this problem by transmitting supplementary information on different networks. We use the ensemble method to aggregate networks of different structures, thus forming better teachers than traditional distillation methods. In addition, discontinuous distillation with progressively enhanced constraints is used to replace fixed distillation in order to reduce loss of information diversity in the distillation process. Our training method improves the distillation effect and achieves strong network-performance improvement. We used some popular models to validate the results. On the CIFAR100 dataset, AlexNet’s accuracy was improved by 5.94%, VGG by 2.88%, ResNet by 5.07%, and DenseNet by 1.28%. Extensive experiments were conducted to demonstrate the effectiveness of the proposed method. On the CIFAR10, CIFAR100, and ImageNet datasets, we observed significant improvements over traditional knowledge distillation. MDPI 2019-04-02 /pmc/articles/PMC7514841/ /pubmed/33267071 http://dx.doi.org/10.3390/e21040357 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Gao, Liang
Lan, Xu
Mi, Haibo
Feng, Dawei
Xu, Kele
Peng, Yuxing
Multistructure-Based Collaborative Online Distillation
title Multistructure-Based Collaborative Online Distillation
title_full Multistructure-Based Collaborative Online Distillation
title_fullStr Multistructure-Based Collaborative Online Distillation
title_full_unstemmed Multistructure-Based Collaborative Online Distillation
title_short Multistructure-Based Collaborative Online Distillation
title_sort multistructure-based collaborative online distillation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514841/
https://www.ncbi.nlm.nih.gov/pubmed/33267071
http://dx.doi.org/10.3390/e21040357
work_keys_str_mv AT gaoliang multistructurebasedcollaborativeonlinedistillation
AT lanxu multistructurebasedcollaborativeonlinedistillation
AT mihaibo multistructurebasedcollaborativeonlinedistillation
AT fengdawei multistructurebasedcollaborativeonlinedistillation
AT xukele multistructurebasedcollaborativeonlinedistillation
AT pengyuxing multistructurebasedcollaborativeonlinedistillation