Cargando…
CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval
With the proliferation of multi-modal data generated by various sensors, unsupervised multi-modal hashing retrieval has been extensively studied due to its advantages in storage, retrieval efficiency, and label independence. However, there are still two obstacles to existing unsupervised methods: (1...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10099083/ https://www.ncbi.nlm.nih.gov/pubmed/37050499 http://dx.doi.org/10.3390/s23073439 |
_version_ | 1785024972738002944 |
---|---|
author | Li, Yewen Ge, Mingyuan Li, Mingyong Li, Tiansong Xiang, Sen |
author_facet | Li, Yewen Ge, Mingyuan Li, Mingyong Li, Tiansong Xiang, Sen |
author_sort | Li, Yewen |
collection | PubMed |
description | With the proliferation of multi-modal data generated by various sensors, unsupervised multi-modal hashing retrieval has been extensively studied due to its advantages in storage, retrieval efficiency, and label independence. However, there are still two obstacles to existing unsupervised methods: (1) As existing methods cannot fully capture the complementary and co-occurrence information of multi-modal data, existing methods suffer from inaccurate similarity measures. (2) Existing methods suffer from unbalanced multi-modal learning and data semantic structure being corrupted in the process of hash codes binarization. To address these obstacles, we devise an effective CLIP-based Adaptive Graph Attention Network (CAGAN) for large-scale unsupervised multi-modal hashing retrieval. Firstly, we use the multi-modal model CLIP to extract fine-grained semantic features, mine similar information from different perspectives of multi-modal data and perform similarity fusion and enhancement. In addition, this paper proposes an adaptive graph attention network to assist the learning of hash codes, which uses an attention mechanism to learn adaptive graph similarity across modalities. It further aggregates the intrinsic neighborhood information of neighboring data nodes through a graph convolutional network to generate more discriminative hash codes. Finally, this paper employs an iterative approximate optimization strategy to mitigate the information loss in the binarization process. Extensive experiments on three benchmark datasets demonstrate that the proposed method significantly outperforms several representative hashing methods in unsupervised multi-modal retrieval tasks. |
format | Online Article Text |
id | pubmed-10099083 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-100990832023-04-14 CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval Li, Yewen Ge, Mingyuan Li, Mingyong Li, Tiansong Xiang, Sen Sensors (Basel) Article With the proliferation of multi-modal data generated by various sensors, unsupervised multi-modal hashing retrieval has been extensively studied due to its advantages in storage, retrieval efficiency, and label independence. However, there are still two obstacles to existing unsupervised methods: (1) As existing methods cannot fully capture the complementary and co-occurrence information of multi-modal data, existing methods suffer from inaccurate similarity measures. (2) Existing methods suffer from unbalanced multi-modal learning and data semantic structure being corrupted in the process of hash codes binarization. To address these obstacles, we devise an effective CLIP-based Adaptive Graph Attention Network (CAGAN) for large-scale unsupervised multi-modal hashing retrieval. Firstly, we use the multi-modal model CLIP to extract fine-grained semantic features, mine similar information from different perspectives of multi-modal data and perform similarity fusion and enhancement. In addition, this paper proposes an adaptive graph attention network to assist the learning of hash codes, which uses an attention mechanism to learn adaptive graph similarity across modalities. It further aggregates the intrinsic neighborhood information of neighboring data nodes through a graph convolutional network to generate more discriminative hash codes. Finally, this paper employs an iterative approximate optimization strategy to mitigate the information loss in the binarization process. Extensive experiments on three benchmark datasets demonstrate that the proposed method significantly outperforms several representative hashing methods in unsupervised multi-modal retrieval tasks. MDPI 2023-03-24 /pmc/articles/PMC10099083/ /pubmed/37050499 http://dx.doi.org/10.3390/s23073439 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Li, Yewen Ge, Mingyuan Li, Mingyong Li, Tiansong Xiang, Sen CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval |
title | CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval |
title_full | CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval |
title_fullStr | CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval |
title_full_unstemmed | CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval |
title_short | CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval |
title_sort | clip-based adaptive graph attention network for large-scale unsupervised multi-modal hashing retrieval |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10099083/ https://www.ncbi.nlm.nih.gov/pubmed/37050499 http://dx.doi.org/10.3390/s23073439 |
work_keys_str_mv | AT liyewen clipbasedadaptivegraphattentionnetworkforlargescaleunsupervisedmultimodalhashingretrieval AT gemingyuan clipbasedadaptivegraphattentionnetworkforlargescaleunsupervisedmultimodalhashingretrieval AT limingyong clipbasedadaptivegraphattentionnetworkforlargescaleunsupervisedmultimodalhashingretrieval AT litiansong clipbasedadaptivegraphattentionnetworkforlargescaleunsupervisedmultimodalhashingretrieval AT xiangsen clipbasedadaptivegraphattentionnetworkforlargescaleunsupervisedmultimodalhashingretrieval |