Cargando…

Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection

Salient object detection (SOD), which is used to identify the most distinctive object in a given scene, plays an important role in computer vision tasks. Most existing RGB-D SOD methods employ a CNN-based network as the backbone to extract features from RGB and depth images; however, the inherent lo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Shuaihui, Jiang, Fengyi, Xu, Boqian
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10650861/ https://www.ncbi.nlm.nih.gov/pubmed/37960501 http://dx.doi.org/10.3390/s23218802

_version_	1785135878638665728
author	Wang, Shuaihui Jiang, Fengyi Xu, Boqian
author_facet	Wang, Shuaihui Jiang, Fengyi Xu, Boqian
author_sort	Wang, Shuaihui
collection	PubMed
description	Salient object detection (SOD), which is used to identify the most distinctive object in a given scene, plays an important role in computer vision tasks. Most existing RGB-D SOD methods employ a CNN-based network as the backbone to extract features from RGB and depth images; however, the inherent locality of a CNN-based network limits the performance of CNN-based methods. To tackle this issue, we propose a novel Swin Transformer-based edge guidance network (SwinEGNet) for RGB-D SOD in which the Swin Transformer is employed as a powerful feature extractor to capture the global context. An edge-guided cross-modal interaction module is proposed to effectively enhance and fuse features. In particular, we employed the Swin Transformer as the backbone to extract features from RGB images and depth maps. Then, we introduced the edge extraction module (EEM) to extract edge features and the depth enhancement module (DEM) to enhance depth features. Additionally, a cross-modal interaction module (CIM) was used to integrate cross-modal features from global and local contexts. Finally, we employed a cascaded decoder to refine the prediction map in a coarse-to-fine manner. Extensive experiments demonstrated that our SwinEGNet achieved the best performance on the LFSD, NLPR, DES, and NJU2K datasets and achieved comparable performance on the STEREO dataset compared to 14 state-of-the-art methods. Our model achieved better performance compared to SwinNet, with 88.4% parameters and 77.2% FLOPs. Our code will be publicly available.
format	Online Article Text
id	pubmed-10650861
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-106508612023-10-29 Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection Wang, Shuaihui Jiang, Fengyi Xu, Boqian Sensors (Basel) Article Salient object detection (SOD), which is used to identify the most distinctive object in a given scene, plays an important role in computer vision tasks. Most existing RGB-D SOD methods employ a CNN-based network as the backbone to extract features from RGB and depth images; however, the inherent locality of a CNN-based network limits the performance of CNN-based methods. To tackle this issue, we propose a novel Swin Transformer-based edge guidance network (SwinEGNet) for RGB-D SOD in which the Swin Transformer is employed as a powerful feature extractor to capture the global context. An edge-guided cross-modal interaction module is proposed to effectively enhance and fuse features. In particular, we employed the Swin Transformer as the backbone to extract features from RGB images and depth maps. Then, we introduced the edge extraction module (EEM) to extract edge features and the depth enhancement module (DEM) to enhance depth features. Additionally, a cross-modal interaction module (CIM) was used to integrate cross-modal features from global and local contexts. Finally, we employed a cascaded decoder to refine the prediction map in a coarse-to-fine manner. Extensive experiments demonstrated that our SwinEGNet achieved the best performance on the LFSD, NLPR, DES, and NJU2K datasets and achieved comparable performance on the STEREO dataset compared to 14 state-of-the-art methods. Our model achieved better performance compared to SwinNet, with 88.4% parameters and 77.2% FLOPs. Our code will be publicly available. MDPI 2023-10-29 /pmc/articles/PMC10650861/ /pubmed/37960501 http://dx.doi.org/10.3390/s23218802 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Wang, Shuaihui Jiang, Fengyi Xu, Boqian Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection
title	Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection
title_full	Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection
title_fullStr	Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection
title_full_unstemmed	Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection
title_short	Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection
title_sort	swin transformer-based edge guidance network for rgb-d salient object detection
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10650861/ https://www.ncbi.nlm.nih.gov/pubmed/37960501 http://dx.doi.org/10.3390/s23218802
work_keys_str_mv	AT wangshuaihui swintransformerbasededgeguidancenetworkforrgbdsalientobjectdetection AT jiangfengyi swintransformerbasededgeguidancenetworkforrgbdsalientobjectdetection AT xuboqian swintransformerbasededgeguidancenetworkforrgbdsalientobjectdetection

Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection

Ejemplares similares