Cargando…

Adaptive Modular Convolutional Neural Network for Image Recognition

Image recognition has long been one of the research hotspots in computer vision tasks. The development of deep learning is rapid in recent years, and convolutional neural networks usually need to be designed with fixed resources. If sufficient resources are available, the model can be scaled up to a...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Wenbo, Pan, Yun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9330193/
https://www.ncbi.nlm.nih.gov/pubmed/35897991
http://dx.doi.org/10.3390/s22155488
_version_ 1784758102994714624
author Wu, Wenbo
Pan, Yun
author_facet Wu, Wenbo
Pan, Yun
author_sort Wu, Wenbo
collection PubMed
description Image recognition has long been one of the research hotspots in computer vision tasks. The development of deep learning is rapid in recent years, and convolutional neural networks usually need to be designed with fixed resources. If sufficient resources are available, the model can be scaled up to achieve higher accuracy, for example, VggNet, ResNet, GoogLeNet, etc. Although the accuracy of large-scale models has been improved, the following problems will occur with the expansion of model scale: (1) There may be over-fitting; (2) increasing model parameters; (3) slow model convergence. This paper proposes a design method for a modular convolutional neural network model which solves the problem of over-fitting and large model parameters by connecting multiple modules in parallel. Moreover, each module contains several submodules (three submodules in this paper) and fuses the features extracted from the submodules. The model convergence can be accelerated by using the fused features (the fused features contain more image information). In this study, we add a gate unit based on the attention mechanism to the model, which aims to optimize the structure of the model (select the optimal number of modules), allowing the model to select an optimum network structure by learning and dynamically reducing FLOPs (floating-point operations per second) of the model. Compared to VggNet, ResNet, and GoogLeNet, the structure of the model proposed in this paper is simple and the parameters are small. The proposed model achieves good results in the Kaggle datasets Cats-vs.-Dogs (99.3%), 10-Monkey Species (99.26%), and Birds-400 (99.13%).
format Online
Article
Text
id pubmed-9330193
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-93301932022-07-29 Adaptive Modular Convolutional Neural Network for Image Recognition Wu, Wenbo Pan, Yun Sensors (Basel) Communication Image recognition has long been one of the research hotspots in computer vision tasks. The development of deep learning is rapid in recent years, and convolutional neural networks usually need to be designed with fixed resources. If sufficient resources are available, the model can be scaled up to achieve higher accuracy, for example, VggNet, ResNet, GoogLeNet, etc. Although the accuracy of large-scale models has been improved, the following problems will occur with the expansion of model scale: (1) There may be over-fitting; (2) increasing model parameters; (3) slow model convergence. This paper proposes a design method for a modular convolutional neural network model which solves the problem of over-fitting and large model parameters by connecting multiple modules in parallel. Moreover, each module contains several submodules (three submodules in this paper) and fuses the features extracted from the submodules. The model convergence can be accelerated by using the fused features (the fused features contain more image information). In this study, we add a gate unit based on the attention mechanism to the model, which aims to optimize the structure of the model (select the optimal number of modules), allowing the model to select an optimum network structure by learning and dynamically reducing FLOPs (floating-point operations per second) of the model. Compared to VggNet, ResNet, and GoogLeNet, the structure of the model proposed in this paper is simple and the parameters are small. The proposed model achieves good results in the Kaggle datasets Cats-vs.-Dogs (99.3%), 10-Monkey Species (99.26%), and Birds-400 (99.13%). MDPI 2022-07-22 /pmc/articles/PMC9330193/ /pubmed/35897991 http://dx.doi.org/10.3390/s22155488 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Communication
Wu, Wenbo
Pan, Yun
Adaptive Modular Convolutional Neural Network for Image Recognition
title Adaptive Modular Convolutional Neural Network for Image Recognition
title_full Adaptive Modular Convolutional Neural Network for Image Recognition
title_fullStr Adaptive Modular Convolutional Neural Network for Image Recognition
title_full_unstemmed Adaptive Modular Convolutional Neural Network for Image Recognition
title_short Adaptive Modular Convolutional Neural Network for Image Recognition
title_sort adaptive modular convolutional neural network for image recognition
topic Communication
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9330193/
https://www.ncbi.nlm.nih.gov/pubmed/35897991
http://dx.doi.org/10.3390/s22155488
work_keys_str_mv AT wuwenbo adaptivemodularconvolutionalneuralnetworkforimagerecognition
AT panyun adaptivemodularconvolutionalneuralnetworkforimagerecognition