Cargando…

A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval

Due to the swift growth in the scale of remote sensing imagery, scholars have progressively directed their attention towards achieving efficient and adaptable cross-modal retrieval for remote sensing images. They have also steadily tackled the distinctive challenge posed by the multi-scale attribute...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zheng, Fuzhong, Wang, Xu, Wang, Luyao, Zhang, Xiong, Zhu, Hongze, Wang, Long, Zhang, Haisu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10610807/ https://www.ncbi.nlm.nih.gov/pubmed/37896530 http://dx.doi.org/10.3390/s23208437

_version_	1785128343399563264
author	Zheng, Fuzhong Wang, Xu Wang, Luyao Zhang, Xiong Zhu, Hongze Wang, Long Zhang, Haisu
author_facet	Zheng, Fuzhong Wang, Xu Wang, Luyao Zhang, Xiong Zhu, Hongze Wang, Long Zhang, Haisu
author_sort	Zheng, Fuzhong
collection	PubMed
description	Due to the swift growth in the scale of remote sensing imagery, scholars have progressively directed their attention towards achieving efficient and adaptable cross-modal retrieval for remote sensing images. They have also steadily tackled the distinctive challenge posed by the multi-scale attributes of these images. However, existing studies primarily concentrate on the characterization of these features, neglecting the comprehensive investigation of the complex relationship between multi-scale targets and the semantic alignment of these targets with text. To address this issue, this study introduces a fine-grained semantic alignment method that adequately aggregates multi-scale information (referred to as FAAMI). The proposed approach comprises multiple stages. Initially, we employ a computing-friendly cross-layer feature connection method to construct a multi-scale feature representation of an image. Subsequently, we devise an efficient feature consistency enhancement module to rectify the incongruous semantic discrimination observed in cross-layer features. Finally, a shallow cross-attention network is employed to capture the fine-grained semantic relationship between multiple-scale image regions and the corresponding words in the text. Extensive experiments were conducted using two datasets: RSICD and RSITMD. The results demonstrate that the performance of FAAMI surpasses that of recently proposed advanced models in the same domain, with significant improvements observed in R@K and other evaluation metrics. Specifically, the mR values achieved by FAAMI are 23.18% and 35.99% for the two datasets, respectively.
format	Online Article Text
id	pubmed-10610807
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-106108072023-10-28 A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval Zheng, Fuzhong Wang, Xu Wang, Luyao Zhang, Xiong Zhu, Hongze Wang, Long Zhang, Haisu Sensors (Basel) Article Due to the swift growth in the scale of remote sensing imagery, scholars have progressively directed their attention towards achieving efficient and adaptable cross-modal retrieval for remote sensing images. They have also steadily tackled the distinctive challenge posed by the multi-scale attributes of these images. However, existing studies primarily concentrate on the characterization of these features, neglecting the comprehensive investigation of the complex relationship between multi-scale targets and the semantic alignment of these targets with text. To address this issue, this study introduces a fine-grained semantic alignment method that adequately aggregates multi-scale information (referred to as FAAMI). The proposed approach comprises multiple stages. Initially, we employ a computing-friendly cross-layer feature connection method to construct a multi-scale feature representation of an image. Subsequently, we devise an efficient feature consistency enhancement module to rectify the incongruous semantic discrimination observed in cross-layer features. Finally, a shallow cross-attention network is employed to capture the fine-grained semantic relationship between multiple-scale image regions and the corresponding words in the text. Extensive experiments were conducted using two datasets: RSICD and RSITMD. The results demonstrate that the performance of FAAMI surpasses that of recently proposed advanced models in the same domain, with significant improvements observed in R@K and other evaluation metrics. Specifically, the mR values achieved by FAAMI are 23.18% and 35.99% for the two datasets, respectively. MDPI 2023-10-13 /pmc/articles/PMC10610807/ /pubmed/37896530 http://dx.doi.org/10.3390/s23208437 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Zheng, Fuzhong Wang, Xu Wang, Luyao Zhang, Xiong Zhu, Hongze Wang, Long Zhang, Haisu A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval
title	A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval
title_full	A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval
title_fullStr	A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval
title_full_unstemmed	A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval
title_short	A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval
title_sort	fine-grained semantic alignment method specific to aggregate multi-scale information for cross-modal remote sensing image retrieval
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10610807/ https://www.ncbi.nlm.nih.gov/pubmed/37896530 http://dx.doi.org/10.3390/s23208437
work_keys_str_mv	AT zhengfuzhong afinegrainedsemanticalignmentmethodspecifictoaggregatemultiscaleinformationforcrossmodalremotesensingimageretrieval AT wangxu afinegrainedsemanticalignmentmethodspecifictoaggregatemultiscaleinformationforcrossmodalremotesensingimageretrieval AT wangluyao afinegrainedsemanticalignmentmethodspecifictoaggregatemultiscaleinformationforcrossmodalremotesensingimageretrieval AT zhangxiong afinegrainedsemanticalignmentmethodspecifictoaggregatemultiscaleinformationforcrossmodalremotesensingimageretrieval AT zhuhongze afinegrainedsemanticalignmentmethodspecifictoaggregatemultiscaleinformationforcrossmodalremotesensingimageretrieval AT wanglong afinegrainedsemanticalignmentmethodspecifictoaggregatemultiscaleinformationforcrossmodalremotesensingimageretrieval AT zhanghaisu afinegrainedsemanticalignmentmethodspecifictoaggregatemultiscaleinformationforcrossmodalremotesensingimageretrieval AT zhengfuzhong finegrainedsemanticalignmentmethodspecifictoaggregatemultiscaleinformationforcrossmodalremotesensingimageretrieval AT wangxu finegrainedsemanticalignmentmethodspecifictoaggregatemultiscaleinformationforcrossmodalremotesensingimageretrieval AT wangluyao finegrainedsemanticalignmentmethodspecifictoaggregatemultiscaleinformationforcrossmodalremotesensingimageretrieval AT zhangxiong finegrainedsemanticalignmentmethodspecifictoaggregatemultiscaleinformationforcrossmodalremotesensingimageretrieval AT zhuhongze finegrainedsemanticalignmentmethodspecifictoaggregatemultiscaleinformationforcrossmodalremotesensingimageretrieval AT wanglong finegrainedsemanticalignmentmethodspecifictoaggregatemultiscaleinformationforcrossmodalremotesensingimageretrieval AT zhanghaisu finegrainedsemanticalignmentmethodspecifictoaggregatemultiscaleinformationforcrossmodalremotesensingimageretrieval

A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval

Ejemplares similares