Cargando…

A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval

Due to the swift growth in the scale of remote sensing imagery, scholars have progressively directed their attention towards achieving efficient and adaptable cross-modal retrieval for remote sensing images. They have also steadily tackled the distinctive challenge posed by the multi-scale attribute...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zheng, Fuzhong, Wang, Xu, Wang, Luyao, Zhang, Xiong, Zhu, Hongze, Wang, Long, Zhang, Haisu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10610807/ https://www.ncbi.nlm.nih.gov/pubmed/37896530 http://dx.doi.org/10.3390/s23208437

Descripción
Sumario:	Due to the swift growth in the scale of remote sensing imagery, scholars have progressively directed their attention towards achieving efficient and adaptable cross-modal retrieval for remote sensing images. They have also steadily tackled the distinctive challenge posed by the multi-scale attributes of these images. However, existing studies primarily concentrate on the characterization of these features, neglecting the comprehensive investigation of the complex relationship between multi-scale targets and the semantic alignment of these targets with text. To address this issue, this study introduces a fine-grained semantic alignment method that adequately aggregates multi-scale information (referred to as FAAMI). The proposed approach comprises multiple stages. Initially, we employ a computing-friendly cross-layer feature connection method to construct a multi-scale feature representation of an image. Subsequently, we devise an efficient feature consistency enhancement module to rectify the incongruous semantic discrimination observed in cross-layer features. Finally, a shallow cross-attention network is employed to capture the fine-grained semantic relationship between multiple-scale image regions and the corresponding words in the text. Extensive experiments were conducted using two datasets: RSICD and RSITMD. The results demonstrate that the performance of FAAMI surpasses that of recently proposed advanced models in the same domain, with significant improvements observed in R@K and other evaluation metrics. Specifically, the mR values achieved by FAAMI are 23.18% and 35.99% for the two datasets, respectively.

A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval

Ejemplares similares