Cargando…

Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation

As an important computer vision technique, image segmentation has been widely used in various tasks. However, in some extreme cases, the insufficient illumination would result in a great impact on the performance of the model. So more and more fully supervised methods use multi-modal images as their...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Kechen, Zhang, Yiming, Bao, Yanqi, Zhao, Ying, Yan, Yunhui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10386587/
https://www.ncbi.nlm.nih.gov/pubmed/37514905
http://dx.doi.org/10.3390/s23146612
_version_ 1785081704323481600
author Song, Kechen
Zhang, Yiming
Bao, Yanqi
Zhao, Ying
Yan, Yunhui
author_facet Song, Kechen
Zhang, Yiming
Bao, Yanqi
Zhao, Ying
Yan, Yunhui
author_sort Song, Kechen
collection PubMed
description As an important computer vision technique, image segmentation has been widely used in various tasks. However, in some extreme cases, the insufficient illumination would result in a great impact on the performance of the model. So more and more fully supervised methods use multi-modal images as their input. The dense annotated large datasets are difficult to obtain, but the few-shot methods still can have satisfactory results with few pixel-annotated samples. Therefore, we propose the Visible-Depth-Thermal (three-modal) images few-shot semantic segmentation method. It utilizes the homogeneous information of three-modal images and the complementary information of different modal images, which can improve the performance of few-shot segmentation tasks. We constructed a novel indoor dataset VDT-2048-5(i) for the three-modal images few-shot semantic segmentation task. We also proposed a Self-Enhanced Mixed Attention Network (SEMANet), which consists of a Self-Enhanced module (SE) and a Mixed Attention module (MA). The SE module amplifies the difference between the different kinds of features and strengthens the weak connection for the foreground features. The MA module fuses the three-modal feature to obtain a better feature. Compared with the most advanced methods before, our model improves mIoU by 3.8% and 3.3% in 1-shot and 5-shot settings, respectively, which achieves state-of-the-art performance. In the future, we will solve failure cases by obtaining more discriminative and robust feature representations, and explore achieving high performance with fewer parameters and computational costs.
format Online
Article
Text
id pubmed-10386587
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-103865872023-07-30 Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation Song, Kechen Zhang, Yiming Bao, Yanqi Zhao, Ying Yan, Yunhui Sensors (Basel) Article As an important computer vision technique, image segmentation has been widely used in various tasks. However, in some extreme cases, the insufficient illumination would result in a great impact on the performance of the model. So more and more fully supervised methods use multi-modal images as their input. The dense annotated large datasets are difficult to obtain, but the few-shot methods still can have satisfactory results with few pixel-annotated samples. Therefore, we propose the Visible-Depth-Thermal (three-modal) images few-shot semantic segmentation method. It utilizes the homogeneous information of three-modal images and the complementary information of different modal images, which can improve the performance of few-shot segmentation tasks. We constructed a novel indoor dataset VDT-2048-5(i) for the three-modal images few-shot semantic segmentation task. We also proposed a Self-Enhanced Mixed Attention Network (SEMANet), which consists of a Self-Enhanced module (SE) and a Mixed Attention module (MA). The SE module amplifies the difference between the different kinds of features and strengthens the weak connection for the foreground features. The MA module fuses the three-modal feature to obtain a better feature. Compared with the most advanced methods before, our model improves mIoU by 3.8% and 3.3% in 1-shot and 5-shot settings, respectively, which achieves state-of-the-art performance. In the future, we will solve failure cases by obtaining more discriminative and robust feature representations, and explore achieving high performance with fewer parameters and computational costs. MDPI 2023-07-22 /pmc/articles/PMC10386587/ /pubmed/37514905 http://dx.doi.org/10.3390/s23146612 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Song, Kechen
Zhang, Yiming
Bao, Yanqi
Zhao, Ying
Yan, Yunhui
Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation
title Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation
title_full Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation
title_fullStr Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation
title_full_unstemmed Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation
title_short Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation
title_sort self-enhanced mixed attention network for three-modal images few-shot semantic segmentation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10386587/
https://www.ncbi.nlm.nih.gov/pubmed/37514905
http://dx.doi.org/10.3390/s23146612
work_keys_str_mv AT songkechen selfenhancedmixedattentionnetworkforthreemodalimagesfewshotsemanticsegmentation
AT zhangyiming selfenhancedmixedattentionnetworkforthreemodalimagesfewshotsemanticsegmentation
AT baoyanqi selfenhancedmixedattentionnetworkforthreemodalimagesfewshotsemanticsegmentation
AT zhaoying selfenhancedmixedattentionnetworkforthreemodalimagesfewshotsemanticsegmentation
AT yanyunhui selfenhancedmixedattentionnetworkforthreemodalimagesfewshotsemanticsegmentation