Cargando…
Adaptive Modular Convolutional Neural Network for Image Recognition
Image recognition has long been one of the research hotspots in computer vision tasks. The development of deep learning is rapid in recent years, and convolutional neural networks usually need to be designed with fixed resources. If sufficient resources are available, the model can be scaled up to a...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9330193/ https://www.ncbi.nlm.nih.gov/pubmed/35897991 http://dx.doi.org/10.3390/s22155488 |
_version_ | 1784758102994714624 |
---|---|
author | Wu, Wenbo Pan, Yun |
author_facet | Wu, Wenbo Pan, Yun |
author_sort | Wu, Wenbo |
collection | PubMed |
description | Image recognition has long been one of the research hotspots in computer vision tasks. The development of deep learning is rapid in recent years, and convolutional neural networks usually need to be designed with fixed resources. If sufficient resources are available, the model can be scaled up to achieve higher accuracy, for example, VggNet, ResNet, GoogLeNet, etc. Although the accuracy of large-scale models has been improved, the following problems will occur with the expansion of model scale: (1) There may be over-fitting; (2) increasing model parameters; (3) slow model convergence. This paper proposes a design method for a modular convolutional neural network model which solves the problem of over-fitting and large model parameters by connecting multiple modules in parallel. Moreover, each module contains several submodules (three submodules in this paper) and fuses the features extracted from the submodules. The model convergence can be accelerated by using the fused features (the fused features contain more image information). In this study, we add a gate unit based on the attention mechanism to the model, which aims to optimize the structure of the model (select the optimal number of modules), allowing the model to select an optimum network structure by learning and dynamically reducing FLOPs (floating-point operations per second) of the model. Compared to VggNet, ResNet, and GoogLeNet, the structure of the model proposed in this paper is simple and the parameters are small. The proposed model achieves good results in the Kaggle datasets Cats-vs.-Dogs (99.3%), 10-Monkey Species (99.26%), and Birds-400 (99.13%). |
format | Online Article Text |
id | pubmed-9330193 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-93301932022-07-29 Adaptive Modular Convolutional Neural Network for Image Recognition Wu, Wenbo Pan, Yun Sensors (Basel) Communication Image recognition has long been one of the research hotspots in computer vision tasks. The development of deep learning is rapid in recent years, and convolutional neural networks usually need to be designed with fixed resources. If sufficient resources are available, the model can be scaled up to achieve higher accuracy, for example, VggNet, ResNet, GoogLeNet, etc. Although the accuracy of large-scale models has been improved, the following problems will occur with the expansion of model scale: (1) There may be over-fitting; (2) increasing model parameters; (3) slow model convergence. This paper proposes a design method for a modular convolutional neural network model which solves the problem of over-fitting and large model parameters by connecting multiple modules in parallel. Moreover, each module contains several submodules (three submodules in this paper) and fuses the features extracted from the submodules. The model convergence can be accelerated by using the fused features (the fused features contain more image information). In this study, we add a gate unit based on the attention mechanism to the model, which aims to optimize the structure of the model (select the optimal number of modules), allowing the model to select an optimum network structure by learning and dynamically reducing FLOPs (floating-point operations per second) of the model. Compared to VggNet, ResNet, and GoogLeNet, the structure of the model proposed in this paper is simple and the parameters are small. The proposed model achieves good results in the Kaggle datasets Cats-vs.-Dogs (99.3%), 10-Monkey Species (99.26%), and Birds-400 (99.13%). MDPI 2022-07-22 /pmc/articles/PMC9330193/ /pubmed/35897991 http://dx.doi.org/10.3390/s22155488 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Communication Wu, Wenbo Pan, Yun Adaptive Modular Convolutional Neural Network for Image Recognition |
title | Adaptive Modular Convolutional Neural Network for Image Recognition |
title_full | Adaptive Modular Convolutional Neural Network for Image Recognition |
title_fullStr | Adaptive Modular Convolutional Neural Network for Image Recognition |
title_full_unstemmed | Adaptive Modular Convolutional Neural Network for Image Recognition |
title_short | Adaptive Modular Convolutional Neural Network for Image Recognition |
title_sort | adaptive modular convolutional neural network for image recognition |
topic | Communication |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9330193/ https://www.ncbi.nlm.nih.gov/pubmed/35897991 http://dx.doi.org/10.3390/s22155488 |
work_keys_str_mv | AT wuwenbo adaptivemodularconvolutionalneuralnetworkforimagerecognition AT panyun adaptivemodularconvolutionalneuralnetworkforimagerecognition |