Cargando…

CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval

With the proliferation of multi-modal data generated by various sensors, unsupervised multi-modal hashing retrieval has been extensively studied due to its advantages in storage, retrieval efficiency, and label independence. However, there are still two obstacles to existing unsupervised methods: (1...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Yewen, Ge, Mingyuan, Li, Mingyong, Li, Tiansong, Xiang, Sen
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10099083/ https://www.ncbi.nlm.nih.gov/pubmed/37050499 http://dx.doi.org/10.3390/s23073439

_version_	1785024972738002944
author	Li, Yewen Ge, Mingyuan Li, Mingyong Li, Tiansong Xiang, Sen
author_facet	Li, Yewen Ge, Mingyuan Li, Mingyong Li, Tiansong Xiang, Sen
author_sort	Li, Yewen
collection	PubMed
description	With the proliferation of multi-modal data generated by various sensors, unsupervised multi-modal hashing retrieval has been extensively studied due to its advantages in storage, retrieval efficiency, and label independence. However, there are still two obstacles to existing unsupervised methods: (1) As existing methods cannot fully capture the complementary and co-occurrence information of multi-modal data, existing methods suffer from inaccurate similarity measures. (2) Existing methods suffer from unbalanced multi-modal learning and data semantic structure being corrupted in the process of hash codes binarization. To address these obstacles, we devise an effective CLIP-based Adaptive Graph Attention Network (CAGAN) for large-scale unsupervised multi-modal hashing retrieval. Firstly, we use the multi-modal model CLIP to extract fine-grained semantic features, mine similar information from different perspectives of multi-modal data and perform similarity fusion and enhancement. In addition, this paper proposes an adaptive graph attention network to assist the learning of hash codes, which uses an attention mechanism to learn adaptive graph similarity across modalities. It further aggregates the intrinsic neighborhood information of neighboring data nodes through a graph convolutional network to generate more discriminative hash codes. Finally, this paper employs an iterative approximate optimization strategy to mitigate the information loss in the binarization process. Extensive experiments on three benchmark datasets demonstrate that the proposed method significantly outperforms several representative hashing methods in unsupervised multi-modal retrieval tasks.
format	Online Article Text
id	pubmed-10099083
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-100990832023-04-14 CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval Li, Yewen Ge, Mingyuan Li, Mingyong Li, Tiansong Xiang, Sen Sensors (Basel) Article With the proliferation of multi-modal data generated by various sensors, unsupervised multi-modal hashing retrieval has been extensively studied due to its advantages in storage, retrieval efficiency, and label independence. However, there are still two obstacles to existing unsupervised methods: (1) As existing methods cannot fully capture the complementary and co-occurrence information of multi-modal data, existing methods suffer from inaccurate similarity measures. (2) Existing methods suffer from unbalanced multi-modal learning and data semantic structure being corrupted in the process of hash codes binarization. To address these obstacles, we devise an effective CLIP-based Adaptive Graph Attention Network (CAGAN) for large-scale unsupervised multi-modal hashing retrieval. Firstly, we use the multi-modal model CLIP to extract fine-grained semantic features, mine similar information from different perspectives of multi-modal data and perform similarity fusion and enhancement. In addition, this paper proposes an adaptive graph attention network to assist the learning of hash codes, which uses an attention mechanism to learn adaptive graph similarity across modalities. It further aggregates the intrinsic neighborhood information of neighboring data nodes through a graph convolutional network to generate more discriminative hash codes. Finally, this paper employs an iterative approximate optimization strategy to mitigate the information loss in the binarization process. Extensive experiments on three benchmark datasets demonstrate that the proposed method significantly outperforms several representative hashing methods in unsupervised multi-modal retrieval tasks. MDPI 2023-03-24 /pmc/articles/PMC10099083/ /pubmed/37050499 http://dx.doi.org/10.3390/s23073439 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Li, Yewen Ge, Mingyuan Li, Mingyong Li, Tiansong Xiang, Sen CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval
title	CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval
title_full	CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval
title_fullStr	CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval
title_full_unstemmed	CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval
title_short	CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval
title_sort	clip-based adaptive graph attention network for large-scale unsupervised multi-modal hashing retrieval
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10099083/ https://www.ncbi.nlm.nih.gov/pubmed/37050499 http://dx.doi.org/10.3390/s23073439
work_keys_str_mv	AT liyewen clipbasedadaptivegraphattentionnetworkforlargescaleunsupervisedmultimodalhashingretrieval AT gemingyuan clipbasedadaptivegraphattentionnetworkforlargescaleunsupervisedmultimodalhashingretrieval AT limingyong clipbasedadaptivegraphattentionnetworkforlargescaleunsupervisedmultimodalhashingretrieval AT litiansong clipbasedadaptivegraphattentionnetworkforlargescaleunsupervisedmultimodalhashingretrieval AT xiangsen clipbasedadaptivegraphattentionnetworkforlargescaleunsupervisedmultimodalhashingretrieval

CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval

Ejemplares similares