Cargando…

CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval

With the proliferation of multi-modal data generated by various sensors, unsupervised multi-modal hashing retrieval has been extensively studied due to its advantages in storage, retrieval efficiency, and label independence. However, there are still two obstacles to existing unsupervised methods: (1...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yewen, Ge, Mingyuan, Li, Mingyong, Li, Tiansong, Xiang, Sen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10099083/
https://www.ncbi.nlm.nih.gov/pubmed/37050499
http://dx.doi.org/10.3390/s23073439
_version_ 1785024972738002944
author Li, Yewen
Ge, Mingyuan
Li, Mingyong
Li, Tiansong
Xiang, Sen
author_facet Li, Yewen
Ge, Mingyuan
Li, Mingyong
Li, Tiansong
Xiang, Sen
author_sort Li, Yewen
collection PubMed
description With the proliferation of multi-modal data generated by various sensors, unsupervised multi-modal hashing retrieval has been extensively studied due to its advantages in storage, retrieval efficiency, and label independence. However, there are still two obstacles to existing unsupervised methods: (1) As existing methods cannot fully capture the complementary and co-occurrence information of multi-modal data, existing methods suffer from inaccurate similarity measures. (2) Existing methods suffer from unbalanced multi-modal learning and data semantic structure being corrupted in the process of hash codes binarization. To address these obstacles, we devise an effective CLIP-based Adaptive Graph Attention Network (CAGAN) for large-scale unsupervised multi-modal hashing retrieval. Firstly, we use the multi-modal model CLIP to extract fine-grained semantic features, mine similar information from different perspectives of multi-modal data and perform similarity fusion and enhancement. In addition, this paper proposes an adaptive graph attention network to assist the learning of hash codes, which uses an attention mechanism to learn adaptive graph similarity across modalities. It further aggregates the intrinsic neighborhood information of neighboring data nodes through a graph convolutional network to generate more discriminative hash codes. Finally, this paper employs an iterative approximate optimization strategy to mitigate the information loss in the binarization process. Extensive experiments on three benchmark datasets demonstrate that the proposed method significantly outperforms several representative hashing methods in unsupervised multi-modal retrieval tasks.
format Online
Article
Text
id pubmed-10099083
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-100990832023-04-14 CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval Li, Yewen Ge, Mingyuan Li, Mingyong Li, Tiansong Xiang, Sen Sensors (Basel) Article With the proliferation of multi-modal data generated by various sensors, unsupervised multi-modal hashing retrieval has been extensively studied due to its advantages in storage, retrieval efficiency, and label independence. However, there are still two obstacles to existing unsupervised methods: (1) As existing methods cannot fully capture the complementary and co-occurrence information of multi-modal data, existing methods suffer from inaccurate similarity measures. (2) Existing methods suffer from unbalanced multi-modal learning and data semantic structure being corrupted in the process of hash codes binarization. To address these obstacles, we devise an effective CLIP-based Adaptive Graph Attention Network (CAGAN) for large-scale unsupervised multi-modal hashing retrieval. Firstly, we use the multi-modal model CLIP to extract fine-grained semantic features, mine similar information from different perspectives of multi-modal data and perform similarity fusion and enhancement. In addition, this paper proposes an adaptive graph attention network to assist the learning of hash codes, which uses an attention mechanism to learn adaptive graph similarity across modalities. It further aggregates the intrinsic neighborhood information of neighboring data nodes through a graph convolutional network to generate more discriminative hash codes. Finally, this paper employs an iterative approximate optimization strategy to mitigate the information loss in the binarization process. Extensive experiments on three benchmark datasets demonstrate that the proposed method significantly outperforms several representative hashing methods in unsupervised multi-modal retrieval tasks. MDPI 2023-03-24 /pmc/articles/PMC10099083/ /pubmed/37050499 http://dx.doi.org/10.3390/s23073439 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Li, Yewen
Ge, Mingyuan
Li, Mingyong
Li, Tiansong
Xiang, Sen
CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval
title CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval
title_full CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval
title_fullStr CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval
title_full_unstemmed CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval
title_short CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval
title_sort clip-based adaptive graph attention network for large-scale unsupervised multi-modal hashing retrieval
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10099083/
https://www.ncbi.nlm.nih.gov/pubmed/37050499
http://dx.doi.org/10.3390/s23073439
work_keys_str_mv AT liyewen clipbasedadaptivegraphattentionnetworkforlargescaleunsupervisedmultimodalhashingretrieval
AT gemingyuan clipbasedadaptivegraphattentionnetworkforlargescaleunsupervisedmultimodalhashingretrieval
AT limingyong clipbasedadaptivegraphattentionnetworkforlargescaleunsupervisedmultimodalhashingretrieval
AT litiansong clipbasedadaptivegraphattentionnetworkforlargescaleunsupervisedmultimodalhashingretrieval
AT xiangsen clipbasedadaptivegraphattentionnetworkforlargescaleunsupervisedmultimodalhashingretrieval