Cargando…

Transformer-Based Model with Dynamic Attention Pyramid Head for Semantic Segmentation of VHR Remote Sensing Imagery

Convolutional neural networks have long dominated semantic segmentation of very-high-resolution (VHR) remote sensing (RS) images. However, restricted by the fixed receptive field of convolution operation, convolution-based models cannot directly obtain contextual information. Meanwhile, Swin Transfo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xu, Yufen, Zhou, Shangbo, Huang, Yuhui
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9689728/ https://www.ncbi.nlm.nih.gov/pubmed/36359709 http://dx.doi.org/10.3390/e24111619

_version_	1784836609215037440
author	Xu, Yufen Zhou, Shangbo Huang, Yuhui
author_facet	Xu, Yufen Zhou, Shangbo Huang, Yuhui
author_sort	Xu, Yufen
collection	PubMed
description	Convolutional neural networks have long dominated semantic segmentation of very-high-resolution (VHR) remote sensing (RS) images. However, restricted by the fixed receptive field of convolution operation, convolution-based models cannot directly obtain contextual information. Meanwhile, Swin Transformer possesses great potential in modeling long-range dependencies. Nevertheless, Swin Transformer breaks images into patches that are single-dimension sequences without considering the position loss problem inside patches. Therefore, Inspired by Swin Transformer and Unet, we propose SUD-Net (Swin transformer-based Unet-like with Dynamic attention pyramid head Network), a new U-shaped architecture composed of Swin Transformer blocks and convolution layers simultaneously through a dual encoder and an upsampling decoder with a Dynamic Attention Pyramid Head (DAPH) attached to the backbone. First, we propose a dual encoder structure combining Swin Transformer blocks and reslayers in reverse order to complement global semantics with detailed representations. Second, aiming at the spatial loss problem inside each patch, we design a Multi-Path Fusion Model (MPFM) with specially devised Patch Attention (PA) to encode position information of patches and adaptively fuse features of different scales through attention mechanisms. Third, a Dynamic Attention Pyramid Head is constructed with deformable convolution to dynamically aggregate effective and important semantic information. SUD-Net achieves exceptional results on ISPRS Potsdam and Vaihingen datasets with 92.51%mF1, 86.4%mIoU, 92.98%OA, 89.49%mF1, 81.26%mIoU, and 90.95%OA, respectively.
format	Online Article Text
id	pubmed-9689728
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-96897282022-11-25 Transformer-Based Model with Dynamic Attention Pyramid Head for Semantic Segmentation of VHR Remote Sensing Imagery Xu, Yufen Zhou, Shangbo Huang, Yuhui Entropy (Basel) Article Convolutional neural networks have long dominated semantic segmentation of very-high-resolution (VHR) remote sensing (RS) images. However, restricted by the fixed receptive field of convolution operation, convolution-based models cannot directly obtain contextual information. Meanwhile, Swin Transformer possesses great potential in modeling long-range dependencies. Nevertheless, Swin Transformer breaks images into patches that are single-dimension sequences without considering the position loss problem inside patches. Therefore, Inspired by Swin Transformer and Unet, we propose SUD-Net (Swin transformer-based Unet-like with Dynamic attention pyramid head Network), a new U-shaped architecture composed of Swin Transformer blocks and convolution layers simultaneously through a dual encoder and an upsampling decoder with a Dynamic Attention Pyramid Head (DAPH) attached to the backbone. First, we propose a dual encoder structure combining Swin Transformer blocks and reslayers in reverse order to complement global semantics with detailed representations. Second, aiming at the spatial loss problem inside each patch, we design a Multi-Path Fusion Model (MPFM) with specially devised Patch Attention (PA) to encode position information of patches and adaptively fuse features of different scales through attention mechanisms. Third, a Dynamic Attention Pyramid Head is constructed with deformable convolution to dynamically aggregate effective and important semantic information. SUD-Net achieves exceptional results on ISPRS Potsdam and Vaihingen datasets with 92.51%mF1, 86.4%mIoU, 92.98%OA, 89.49%mF1, 81.26%mIoU, and 90.95%OA, respectively. MDPI 2022-11-06 /pmc/articles/PMC9689728/ /pubmed/36359709 http://dx.doi.org/10.3390/e24111619 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Xu, Yufen Zhou, Shangbo Huang, Yuhui Transformer-Based Model with Dynamic Attention Pyramid Head for Semantic Segmentation of VHR Remote Sensing Imagery
title	Transformer-Based Model with Dynamic Attention Pyramid Head for Semantic Segmentation of VHR Remote Sensing Imagery
title_full	Transformer-Based Model with Dynamic Attention Pyramid Head for Semantic Segmentation of VHR Remote Sensing Imagery
title_fullStr	Transformer-Based Model with Dynamic Attention Pyramid Head for Semantic Segmentation of VHR Remote Sensing Imagery
title_full_unstemmed	Transformer-Based Model with Dynamic Attention Pyramid Head for Semantic Segmentation of VHR Remote Sensing Imagery
title_short	Transformer-Based Model with Dynamic Attention Pyramid Head for Semantic Segmentation of VHR Remote Sensing Imagery
title_sort	transformer-based model with dynamic attention pyramid head for semantic segmentation of vhr remote sensing imagery
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9689728/ https://www.ncbi.nlm.nih.gov/pubmed/36359709 http://dx.doi.org/10.3390/e24111619
work_keys_str_mv	AT xuyufen transformerbasedmodelwithdynamicattentionpyramidheadforsemanticsegmentationofvhrremotesensingimagery AT zhoushangbo transformerbasedmodelwithdynamicattentionpyramidheadforsemanticsegmentationofvhrremotesensingimagery AT huangyuhui transformerbasedmodelwithdynamicattentionpyramidheadforsemanticsegmentationofvhrremotesensingimagery

Transformer-Based Model with Dynamic Attention Pyramid Head for Semantic Segmentation of VHR Remote Sensing Imagery

Ejemplares similares