Cargando…

Rethinking 1D convolution for lightweight semantic segmentation

Lightweight semantic segmentation promotes the application of semantic segmentation in tiny devices. The existing lightweight semantic segmentation network (LSNet) has the problems of low precision and a large number of parameters. In response to the above problems, we designed a full 1D convolution...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Chunyu, Xu, Fang, Wu, Chengdong, Xu, Chenglong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9947531/
https://www.ncbi.nlm.nih.gov/pubmed/36845064
http://dx.doi.org/10.3389/fnbot.2023.1119231
_version_ 1784892576446283776
author Zhang, Chunyu
Xu, Fang
Wu, Chengdong
Xu, Chenglong
author_facet Zhang, Chunyu
Xu, Fang
Wu, Chengdong
Xu, Chenglong
author_sort Zhang, Chunyu
collection PubMed
description Lightweight semantic segmentation promotes the application of semantic segmentation in tiny devices. The existing lightweight semantic segmentation network (LSNet) has the problems of low precision and a large number of parameters. In response to the above problems, we designed a full 1D convolutional LSNet. The tremendous success of this network is attributed to the following three modules: 1D multi-layer space module (1D-MS), 1D multi-layer channel module (1D-MC), and flow alignment module (FA). The 1D-MS and the 1D-MC add global feature extraction operations based on the multi-layer perceptron (MLP) idea. This module uses 1D convolutional coding, which is more flexible than MLP. It increases the global information operation, improving features’ coding ability. The FA module fuses high-level and low-level semantic information, which solves the problem of precision loss caused by the misalignment of features. We designed a 1D-mixer encoder based on the transformer structure. It performed fusion encoding of the feature space information extracted by the 1D-MS module and the channel information extracted by the 1D-MC module. 1D-mixer obtains high-quality encoded features with very few parameters, which is the key to the network’s success. The attention pyramid with FA (AP-FA) uses an AP to decode features and adds a FA module to solve the problem of feature misalignment. Our network requires no pre-training and only needs a 1080Ti GPU for training. It achieved 72.6 mIoU and 95.6 FPS on the Cityscapes dataset and 70.5 mIoU and 122 FPS on the CamVid dataset. We ported the network trained on the ADE2K dataset to mobile devices, and the latency of 224 ms proves the application value of the network on mobile devices. The results on the three datasets prove that the network generalization ability we designed is powerful. Compared to state-of-the-art lightweight semantic segmentation algorithms, our designed network achieves the best balance between segmentation accuracy and parameters. The parameters of LSNet are only 0.62 M, which is currently the network with the highest segmentation accuracy within 1 M parameters.
format Online
Article
Text
id pubmed-9947531
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-99475312023-02-24 Rethinking 1D convolution for lightweight semantic segmentation Zhang, Chunyu Xu, Fang Wu, Chengdong Xu, Chenglong Front Neurorobot Neuroscience Lightweight semantic segmentation promotes the application of semantic segmentation in tiny devices. The existing lightweight semantic segmentation network (LSNet) has the problems of low precision and a large number of parameters. In response to the above problems, we designed a full 1D convolutional LSNet. The tremendous success of this network is attributed to the following three modules: 1D multi-layer space module (1D-MS), 1D multi-layer channel module (1D-MC), and flow alignment module (FA). The 1D-MS and the 1D-MC add global feature extraction operations based on the multi-layer perceptron (MLP) idea. This module uses 1D convolutional coding, which is more flexible than MLP. It increases the global information operation, improving features’ coding ability. The FA module fuses high-level and low-level semantic information, which solves the problem of precision loss caused by the misalignment of features. We designed a 1D-mixer encoder based on the transformer structure. It performed fusion encoding of the feature space information extracted by the 1D-MS module and the channel information extracted by the 1D-MC module. 1D-mixer obtains high-quality encoded features with very few parameters, which is the key to the network’s success. The attention pyramid with FA (AP-FA) uses an AP to decode features and adds a FA module to solve the problem of feature misalignment. Our network requires no pre-training and only needs a 1080Ti GPU for training. It achieved 72.6 mIoU and 95.6 FPS on the Cityscapes dataset and 70.5 mIoU and 122 FPS on the CamVid dataset. We ported the network trained on the ADE2K dataset to mobile devices, and the latency of 224 ms proves the application value of the network on mobile devices. The results on the three datasets prove that the network generalization ability we designed is powerful. Compared to state-of-the-art lightweight semantic segmentation algorithms, our designed network achieves the best balance between segmentation accuracy and parameters. The parameters of LSNet are only 0.62 M, which is currently the network with the highest segmentation accuracy within 1 M parameters. Frontiers Media S.A. 2023-02-09 /pmc/articles/PMC9947531/ /pubmed/36845064 http://dx.doi.org/10.3389/fnbot.2023.1119231 Text en Copyright © 2023 Zhang, Xu, Wu and Xu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Neuroscience
Zhang, Chunyu
Xu, Fang
Wu, Chengdong
Xu, Chenglong
Rethinking 1D convolution for lightweight semantic segmentation
title Rethinking 1D convolution for lightweight semantic segmentation
title_full Rethinking 1D convolution for lightweight semantic segmentation
title_fullStr Rethinking 1D convolution for lightweight semantic segmentation
title_full_unstemmed Rethinking 1D convolution for lightweight semantic segmentation
title_short Rethinking 1D convolution for lightweight semantic segmentation
title_sort rethinking 1d convolution for lightweight semantic segmentation
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9947531/
https://www.ncbi.nlm.nih.gov/pubmed/36845064
http://dx.doi.org/10.3389/fnbot.2023.1119231
work_keys_str_mv AT zhangchunyu rethinking1dconvolutionforlightweightsemanticsegmentation
AT xufang rethinking1dconvolutionforlightweightsemanticsegmentation
AT wuchengdong rethinking1dconvolutionforlightweightsemanticsegmentation
AT xuchenglong rethinking1dconvolutionforlightweightsemanticsegmentation