Cargando…

Differentiable Network Pruning via Polarization of Probabilistic Channelwise Soft Masks

Channel pruning has been demonstrated as a highly effective approach to compress large convolutional neural networks. Existing differentiable channel pruning methods usually use deterministic soft masks to scale the channelwise outputs and explore an appropriate threshold on the masks to remove unim...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Ming, Wang, Jiapeng, Yu, Zhenhua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9098282/
https://www.ncbi.nlm.nih.gov/pubmed/35571691
http://dx.doi.org/10.1155/2022/7775419
Descripción
Sumario:Channel pruning has been demonstrated as a highly effective approach to compress large convolutional neural networks. Existing differentiable channel pruning methods usually use deterministic soft masks to scale the channelwise outputs and explore an appropriate threshold on the masks to remove unimportant channels, which sometimes causes unexpected damage to the network accuracy when there are no sweet spots that clearly separate important channels from redundant ones. In this article, we introduce a new differentiable channel pruning method based on polarization of probabilistic channelwise soft masks (PPSMs). We use variational inference to approximate the posterior distributions of the masks and simultaneously exploit a polarization regularization to push the probabilistic masks towards either 0 or 1; thus, the channels with near-zero masks can be safely eliminated with little hurt on network accuracy. Our method significantly relieves the difficulty faced by the existing methods to find an appropriate threshold on the masks. The joint inference and polarization of probabilistic soft masks enable PPSM to yield better pruning results than the state of the arts. For instance, our method prunes 65.91% FLOPs of ResNet50 on the ImageNet dataset with only 0.7% model accuracy degradation.