Cargando…

Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation

As an important computer vision technique, image segmentation has been widely used in various tasks. However, in some extreme cases, the insufficient illumination would result in a great impact on the performance of the model. So more and more fully supervised methods use multi-modal images as their...

Descripción completa

Detalles Bibliográficos
Autores principales:	Song, Kechen, Zhang, Yiming, Bao, Yanqi, Zhao, Ying, Yan, Yunhui
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10386587/ https://www.ncbi.nlm.nih.gov/pubmed/37514905 http://dx.doi.org/10.3390/s23146612

_version_	1785081704323481600
author	Song, Kechen Zhang, Yiming Bao, Yanqi Zhao, Ying Yan, Yunhui
author_facet	Song, Kechen Zhang, Yiming Bao, Yanqi Zhao, Ying Yan, Yunhui
author_sort	Song, Kechen
collection	PubMed
description	As an important computer vision technique, image segmentation has been widely used in various tasks. However, in some extreme cases, the insufficient illumination would result in a great impact on the performance of the model. So more and more fully supervised methods use multi-modal images as their input. The dense annotated large datasets are difficult to obtain, but the few-shot methods still can have satisfactory results with few pixel-annotated samples. Therefore, we propose the Visible-Depth-Thermal (three-modal) images few-shot semantic segmentation method. It utilizes the homogeneous information of three-modal images and the complementary information of different modal images, which can improve the performance of few-shot segmentation tasks. We constructed a novel indoor dataset VDT-2048-5(i) for the three-modal images few-shot semantic segmentation task. We also proposed a Self-Enhanced Mixed Attention Network (SEMANet), which consists of a Self-Enhanced module (SE) and a Mixed Attention module (MA). The SE module amplifies the difference between the different kinds of features and strengthens the weak connection for the foreground features. The MA module fuses the three-modal feature to obtain a better feature. Compared with the most advanced methods before, our model improves mIoU by 3.8% and 3.3% in 1-shot and 5-shot settings, respectively, which achieves state-of-the-art performance. In the future, we will solve failure cases by obtaining more discriminative and robust feature representations, and explore achieving high performance with fewer parameters and computational costs.
format	Online Article Text
id	pubmed-10386587
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-103865872023-07-30 Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation Song, Kechen Zhang, Yiming Bao, Yanqi Zhao, Ying Yan, Yunhui Sensors (Basel) Article As an important computer vision technique, image segmentation has been widely used in various tasks. However, in some extreme cases, the insufficient illumination would result in a great impact on the performance of the model. So more and more fully supervised methods use multi-modal images as their input. The dense annotated large datasets are difficult to obtain, but the few-shot methods still can have satisfactory results with few pixel-annotated samples. Therefore, we propose the Visible-Depth-Thermal (three-modal) images few-shot semantic segmentation method. It utilizes the homogeneous information of three-modal images and the complementary information of different modal images, which can improve the performance of few-shot segmentation tasks. We constructed a novel indoor dataset VDT-2048-5(i) for the three-modal images few-shot semantic segmentation task. We also proposed a Self-Enhanced Mixed Attention Network (SEMANet), which consists of a Self-Enhanced module (SE) and a Mixed Attention module (MA). The SE module amplifies the difference between the different kinds of features and strengthens the weak connection for the foreground features. The MA module fuses the three-modal feature to obtain a better feature. Compared with the most advanced methods before, our model improves mIoU by 3.8% and 3.3% in 1-shot and 5-shot settings, respectively, which achieves state-of-the-art performance. In the future, we will solve failure cases by obtaining more discriminative and robust feature representations, and explore achieving high performance with fewer parameters and computational costs. MDPI 2023-07-22 /pmc/articles/PMC10386587/ /pubmed/37514905 http://dx.doi.org/10.3390/s23146612 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Song, Kechen Zhang, Yiming Bao, Yanqi Zhao, Ying Yan, Yunhui Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation
title	Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation
title_full	Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation
title_fullStr	Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation
title_full_unstemmed	Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation
title_short	Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation
title_sort	self-enhanced mixed attention network for three-modal images few-shot semantic segmentation
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10386587/ https://www.ncbi.nlm.nih.gov/pubmed/37514905 http://dx.doi.org/10.3390/s23146612
work_keys_str_mv	AT songkechen selfenhancedmixedattentionnetworkforthreemodalimagesfewshotsemanticsegmentation AT zhangyiming selfenhancedmixedattentionnetworkforthreemodalimagesfewshotsemanticsegmentation AT baoyanqi selfenhancedmixedattentionnetworkforthreemodalimagesfewshotsemanticsegmentation AT zhaoying selfenhancedmixedattentionnetworkforthreemodalimagesfewshotsemanticsegmentation AT yanyunhui selfenhancedmixedattentionnetworkforthreemodalimagesfewshotsemanticsegmentation

Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation

Ejemplares similares