Cargando…

S-MAT: Semantic-Driven Masked Attention Transformer for Multi-Label Aerial Image Classification

Multi-label aerial scene image classification is a long-standing and challenging research problem in the remote sensing field. As land cover objects usually co-exist in an aerial scene image, modeling label dependencies is a compelling approach to improve the performance. Previous methods generally...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Hongjun, Xu, Cheng, Liu, Hongzhe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9317133/
https://www.ncbi.nlm.nih.gov/pubmed/35891109
http://dx.doi.org/10.3390/s22145433
_version_ 1784754982125305856
author Wu, Hongjun
Xu, Cheng
Liu, Hongzhe
author_facet Wu, Hongjun
Xu, Cheng
Liu, Hongzhe
author_sort Wu, Hongjun
collection PubMed
description Multi-label aerial scene image classification is a long-standing and challenging research problem in the remote sensing field. As land cover objects usually co-exist in an aerial scene image, modeling label dependencies is a compelling approach to improve the performance. Previous methods generally directly model the label dependencies among all the categories in the target dataset. However, most of the semantic features extracted from an image are relevant to the existing objects, making the dependencies among the nonexistant categories unable to be effectively evaluated. These redundant label dependencies may bring noise and further decrease the performance of classification. To solve this problem, we propose S-MAT, a Semantic-driven Masked Attention Transformer for multi-label aerial scene image classification. S-MAT adopts a Masked Attention Transformer (MAT) to capture the correlations among the label embeddings constructed by a Semantic Disentanglement Module (SDM). Moreover, the proposed masked attention in MAT can filter out the redundant dependencies and enhance the robustness of the model. As a result, the proposed method can explicitly and accurately capture the label dependencies. Therefore, our method achieves CF1s of [Formula: see text] , [Formula: see text] , and [Formula: see text] on three multi-label aerial scene image classification benchmark datasets: UC-Merced Multi-label, AID Multi-label, and MLRSNet, respectively. In addition, extensive ablation studies and empirical analysis are provided to demonstrate the effectiveness of the essential components of our method under different factors.
format Online
Article
Text
id pubmed-9317133
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-93171332022-07-27 S-MAT: Semantic-Driven Masked Attention Transformer for Multi-Label Aerial Image Classification Wu, Hongjun Xu, Cheng Liu, Hongzhe Sensors (Basel) Article Multi-label aerial scene image classification is a long-standing and challenging research problem in the remote sensing field. As land cover objects usually co-exist in an aerial scene image, modeling label dependencies is a compelling approach to improve the performance. Previous methods generally directly model the label dependencies among all the categories in the target dataset. However, most of the semantic features extracted from an image are relevant to the existing objects, making the dependencies among the nonexistant categories unable to be effectively evaluated. These redundant label dependencies may bring noise and further decrease the performance of classification. To solve this problem, we propose S-MAT, a Semantic-driven Masked Attention Transformer for multi-label aerial scene image classification. S-MAT adopts a Masked Attention Transformer (MAT) to capture the correlations among the label embeddings constructed by a Semantic Disentanglement Module (SDM). Moreover, the proposed masked attention in MAT can filter out the redundant dependencies and enhance the robustness of the model. As a result, the proposed method can explicitly and accurately capture the label dependencies. Therefore, our method achieves CF1s of [Formula: see text] , [Formula: see text] , and [Formula: see text] on three multi-label aerial scene image classification benchmark datasets: UC-Merced Multi-label, AID Multi-label, and MLRSNet, respectively. In addition, extensive ablation studies and empirical analysis are provided to demonstrate the effectiveness of the essential components of our method under different factors. MDPI 2022-07-20 /pmc/articles/PMC9317133/ /pubmed/35891109 http://dx.doi.org/10.3390/s22145433 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Wu, Hongjun
Xu, Cheng
Liu, Hongzhe
S-MAT: Semantic-Driven Masked Attention Transformer for Multi-Label Aerial Image Classification
title S-MAT: Semantic-Driven Masked Attention Transformer for Multi-Label Aerial Image Classification
title_full S-MAT: Semantic-Driven Masked Attention Transformer for Multi-Label Aerial Image Classification
title_fullStr S-MAT: Semantic-Driven Masked Attention Transformer for Multi-Label Aerial Image Classification
title_full_unstemmed S-MAT: Semantic-Driven Masked Attention Transformer for Multi-Label Aerial Image Classification
title_short S-MAT: Semantic-Driven Masked Attention Transformer for Multi-Label Aerial Image Classification
title_sort s-mat: semantic-driven masked attention transformer for multi-label aerial image classification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9317133/
https://www.ncbi.nlm.nih.gov/pubmed/35891109
http://dx.doi.org/10.3390/s22145433
work_keys_str_mv AT wuhongjun smatsemanticdrivenmaskedattentiontransformerformultilabelaerialimageclassification
AT xucheng smatsemanticdrivenmaskedattentiontransformerformultilabelaerialimageclassification
AT liuhongzhe smatsemanticdrivenmaskedattentiontransformerformultilabelaerialimageclassification