Cargando…

Transformer-Based Semantic Segmentation for Extraction of Building Footprints from Very-High-Resolution Images

Semantic segmentation with deep learning networks has become an important approach to the extraction of objects from very high-resolution remote sensing images. Vision Transformer networks have shown significant improvements in performance compared to traditional convolutional neural networks (CNNs)...

Descripción completa

Detalles Bibliográficos
Autores principales:	Song, Jia, Zhu, A-Xing, Zhu, Yunqiang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10255903/ https://www.ncbi.nlm.nih.gov/pubmed/37299892 http://dx.doi.org/10.3390/s23115166

_version_	1785056985415155712
author	Song, Jia Zhu, A-Xing Zhu, Yunqiang
author_facet	Song, Jia Zhu, A-Xing Zhu, Yunqiang
author_sort	Song, Jia
collection	PubMed
description	Semantic segmentation with deep learning networks has become an important approach to the extraction of objects from very high-resolution remote sensing images. Vision Transformer networks have shown significant improvements in performance compared to traditional convolutional neural networks (CNNs) in semantic segmentation. Vision Transformer networks have different architectures to CNNs. Image patches, linear embedding, and multi-head self-attention (MHSA) are several of the main hyperparameters. How we should configure them for the extraction of objects in VHR images and how they affect the accuracy of networks are topics that have not been sufficiently investigated. This article explores the role of vision Transformer networks in the extraction of building footprints from very-high-resolution (VHR) images. Transformer-based models with different hyperparameter values were designed and compared, and their impact on accuracy was analyzed. The results show that smaller image patches and higher-dimension embeddings result in better accuracy. In addition, the Transformer-based network is shown to be scalable and can be trained with general-scale graphics processing units (GPUs) with comparable model sizes and training times to convolutional neural networks while achieving higher accuracy. The study provides valuable insights into the potential of vision Transformer networks in object extraction using VHR images.
format	Online Article Text
id	pubmed-10255903
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-102559032023-06-10 Transformer-Based Semantic Segmentation for Extraction of Building Footprints from Very-High-Resolution Images Song, Jia Zhu, A-Xing Zhu, Yunqiang Sensors (Basel) Article Semantic segmentation with deep learning networks has become an important approach to the extraction of objects from very high-resolution remote sensing images. Vision Transformer networks have shown significant improvements in performance compared to traditional convolutional neural networks (CNNs) in semantic segmentation. Vision Transformer networks have different architectures to CNNs. Image patches, linear embedding, and multi-head self-attention (MHSA) are several of the main hyperparameters. How we should configure them for the extraction of objects in VHR images and how they affect the accuracy of networks are topics that have not been sufficiently investigated. This article explores the role of vision Transformer networks in the extraction of building footprints from very-high-resolution (VHR) images. Transformer-based models with different hyperparameter values were designed and compared, and their impact on accuracy was analyzed. The results show that smaller image patches and higher-dimension embeddings result in better accuracy. In addition, the Transformer-based network is shown to be scalable and can be trained with general-scale graphics processing units (GPUs) with comparable model sizes and training times to convolutional neural networks while achieving higher accuracy. The study provides valuable insights into the potential of vision Transformer networks in object extraction using VHR images. MDPI 2023-05-29 /pmc/articles/PMC10255903/ /pubmed/37299892 http://dx.doi.org/10.3390/s23115166 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Song, Jia Zhu, A-Xing Zhu, Yunqiang Transformer-Based Semantic Segmentation for Extraction of Building Footprints from Very-High-Resolution Images
title	Transformer-Based Semantic Segmentation for Extraction of Building Footprints from Very-High-Resolution Images
title_full	Transformer-Based Semantic Segmentation for Extraction of Building Footprints from Very-High-Resolution Images
title_fullStr	Transformer-Based Semantic Segmentation for Extraction of Building Footprints from Very-High-Resolution Images
title_full_unstemmed	Transformer-Based Semantic Segmentation for Extraction of Building Footprints from Very-High-Resolution Images
title_short	Transformer-Based Semantic Segmentation for Extraction of Building Footprints from Very-High-Resolution Images
title_sort	transformer-based semantic segmentation for extraction of building footprints from very-high-resolution images
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10255903/ https://www.ncbi.nlm.nih.gov/pubmed/37299892 http://dx.doi.org/10.3390/s23115166
work_keys_str_mv	AT songjia transformerbasedsemanticsegmentationforextractionofbuildingfootprintsfromveryhighresolutionimages AT zhuaxing transformerbasedsemanticsegmentationforextractionofbuildingfootprintsfromveryhighresolutionimages AT zhuyunqiang transformerbasedsemanticsegmentationforextractionofbuildingfootprintsfromveryhighresolutionimages

Transformer-Based Semantic Segmentation for Extraction of Building Footprints from Very-High-Resolution Images

Ejemplares similares