Cargando…

CMANet: Cross-Modality Attention Network for Indoor-Scene Semantic Segmentation

Indoor-scene semantic segmentation is of great significance to indoor navigation, high-precision map creation, route planning, etc. However, incorporating RGB and HHA images for indoor-scene semantic segmentation is a promising yet challenging task, due to the diversity of textures and structures an...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhu, Longze, Kang, Zhizhong, Zhou, Mei, Yang, Xi, Wang, Zhen, Cao, Zhen, Ye, Chenming
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9659145/ https://www.ncbi.nlm.nih.gov/pubmed/36366217 http://dx.doi.org/10.3390/s22218520

_version_	1784830129120215040
author	Zhu, Longze Kang, Zhizhong Zhou, Mei Yang, Xi Wang, Zhen Cao, Zhen Ye, Chenming
author_facet	Zhu, Longze Kang, Zhizhong Zhou, Mei Yang, Xi Wang, Zhen Cao, Zhen Ye, Chenming
author_sort	Zhu, Longze
collection	PubMed
description	Indoor-scene semantic segmentation is of great significance to indoor navigation, high-precision map creation, route planning, etc. However, incorporating RGB and HHA images for indoor-scene semantic segmentation is a promising yet challenging task, due to the diversity of textures and structures and the disparity of multi-modality in physical significance. In this paper, we propose a Cross-Modality Attention Network (CMANet) that facilitates the extraction of both RGB and HHA features and enhances the cross-modality feature integration. CMANet is constructed under the encoder–decoder architecture. The encoder consists of two parallel branches that successively extract the latent modality features from RGB and HHA images, respectively. Particularly, a novel self-attention mechanism-based Cross-Modality Refine Gate (CMRG) is presented, which bridges the two branches. More importantly, the CMRG achieves cross-modality feature fusion and produces certain refined aggregated features; it serves as the most crucial part of CMANet. The decoder is a multi-stage up-sampled backbone that is composed of different residual blocks at each up-sampling stage. Furthermore, bi-directional multi-step propagation and pyramid supervision are applied to assist the leaning process. To evaluate the effectiveness and efficiency of the proposed method, extensive experiments are conducted on NYUDv2 and SUN RGB-D datasets. Experimental results demonstrate that our method outperforms the existing ones for indoor semantic-segmentation tasks.
format	Online Article Text
id	pubmed-9659145
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-96591452022-11-15 CMANet: Cross-Modality Attention Network for Indoor-Scene Semantic Segmentation Zhu, Longze Kang, Zhizhong Zhou, Mei Yang, Xi Wang, Zhen Cao, Zhen Ye, Chenming Sensors (Basel) Article Indoor-scene semantic segmentation is of great significance to indoor navigation, high-precision map creation, route planning, etc. However, incorporating RGB and HHA images for indoor-scene semantic segmentation is a promising yet challenging task, due to the diversity of textures and structures and the disparity of multi-modality in physical significance. In this paper, we propose a Cross-Modality Attention Network (CMANet) that facilitates the extraction of both RGB and HHA features and enhances the cross-modality feature integration. CMANet is constructed under the encoder–decoder architecture. The encoder consists of two parallel branches that successively extract the latent modality features from RGB and HHA images, respectively. Particularly, a novel self-attention mechanism-based Cross-Modality Refine Gate (CMRG) is presented, which bridges the two branches. More importantly, the CMRG achieves cross-modality feature fusion and produces certain refined aggregated features; it serves as the most crucial part of CMANet. The decoder is a multi-stage up-sampled backbone that is composed of different residual blocks at each up-sampling stage. Furthermore, bi-directional multi-step propagation and pyramid supervision are applied to assist the leaning process. To evaluate the effectiveness and efficiency of the proposed method, extensive experiments are conducted on NYUDv2 and SUN RGB-D datasets. Experimental results demonstrate that our method outperforms the existing ones for indoor semantic-segmentation tasks. MDPI 2022-11-05 /pmc/articles/PMC9659145/ /pubmed/36366217 http://dx.doi.org/10.3390/s22218520 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Zhu, Longze Kang, Zhizhong Zhou, Mei Yang, Xi Wang, Zhen Cao, Zhen Ye, Chenming CMANet: Cross-Modality Attention Network for Indoor-Scene Semantic Segmentation
title	CMANet: Cross-Modality Attention Network for Indoor-Scene Semantic Segmentation
title_full	CMANet: Cross-Modality Attention Network for Indoor-Scene Semantic Segmentation
title_fullStr	CMANet: Cross-Modality Attention Network for Indoor-Scene Semantic Segmentation
title_full_unstemmed	CMANet: Cross-Modality Attention Network for Indoor-Scene Semantic Segmentation
title_short	CMANet: Cross-Modality Attention Network for Indoor-Scene Semantic Segmentation
title_sort	cmanet: cross-modality attention network for indoor-scene semantic segmentation
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9659145/ https://www.ncbi.nlm.nih.gov/pubmed/36366217 http://dx.doi.org/10.3390/s22218520
work_keys_str_mv	AT zhulongze cmanetcrossmodalityattentionnetworkforindoorscenesemanticsegmentation AT kangzhizhong cmanetcrossmodalityattentionnetworkforindoorscenesemanticsegmentation AT zhoumei cmanetcrossmodalityattentionnetworkforindoorscenesemanticsegmentation AT yangxi cmanetcrossmodalityattentionnetworkforindoorscenesemanticsegmentation AT wangzhen cmanetcrossmodalityattentionnetworkforindoorscenesemanticsegmentation AT caozhen cmanetcrossmodalityattentionnetworkforindoorscenesemanticsegmentation AT yechenming cmanetcrossmodalityattentionnetworkforindoorscenesemanticsegmentation

CMANet: Cross-Modality Attention Network for Indoor-Scene Semantic Segmentation

Ejemplares similares