Cargando…

HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval

Image-text retrieval aims to search related results of one modality by querying another modality. As a fundamental and key problem in cross-modal retrieval, image-text retrieval is still a challenging problem owing to the complementary and imbalanced relationship between different modalities (i.e.,...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Shuhuai, Liu, Zheng, Pei, Xinlei, Xu, Junhao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10007124/
https://www.ncbi.nlm.nih.gov/pubmed/36904776
http://dx.doi.org/10.3390/s23052559
_version_ 1784905440762527744
author Wang, Shuhuai
Liu, Zheng
Pei, Xinlei
Xu, Junhao
author_facet Wang, Shuhuai
Liu, Zheng
Pei, Xinlei
Xu, Junhao
author_sort Wang, Shuhuai
collection PubMed
description Image-text retrieval aims to search related results of one modality by querying another modality. As a fundamental and key problem in cross-modal retrieval, image-text retrieval is still a challenging problem owing to the complementary and imbalanced relationship between different modalities (i.e., Image and Text) and different granularities (i.e., Global-level and Local-level). However, existing works have not fully considered how to effectively mine and fuse the complementarities between images and texts at different granularities. Therefore, in this paper, we propose a hierarchical adaptive alignment network, whose contributions are as follows: (1) We propose a multi-level alignment network, which simultaneously mines global-level and local-level data, thereby enhancing the semantic association between images and texts. (2) We propose an adaptive weighted loss to flexibly optimize the image-text similarity with two stages in a unified framework. (3) We conduct extensive experiments on three public benchmark datasets (Corel 5K, Pascal Sentence, and Wiki) and compare them with eleven state-of-the-art methods. The experimental results thoroughly verify the effectiveness of our proposed method.
format Online
Article
Text
id pubmed-10007124
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-100071242023-03-12 HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval Wang, Shuhuai Liu, Zheng Pei, Xinlei Xu, Junhao Sensors (Basel) Article Image-text retrieval aims to search related results of one modality by querying another modality. As a fundamental and key problem in cross-modal retrieval, image-text retrieval is still a challenging problem owing to the complementary and imbalanced relationship between different modalities (i.e., Image and Text) and different granularities (i.e., Global-level and Local-level). However, existing works have not fully considered how to effectively mine and fuse the complementarities between images and texts at different granularities. Therefore, in this paper, we propose a hierarchical adaptive alignment network, whose contributions are as follows: (1) We propose a multi-level alignment network, which simultaneously mines global-level and local-level data, thereby enhancing the semantic association between images and texts. (2) We propose an adaptive weighted loss to flexibly optimize the image-text similarity with two stages in a unified framework. (3) We conduct extensive experiments on three public benchmark datasets (Corel 5K, Pascal Sentence, and Wiki) and compare them with eleven state-of-the-art methods. The experimental results thoroughly verify the effectiveness of our proposed method. MDPI 2023-02-25 /pmc/articles/PMC10007124/ /pubmed/36904776 http://dx.doi.org/10.3390/s23052559 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Wang, Shuhuai
Liu, Zheng
Pei, Xinlei
Xu, Junhao
HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
title HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
title_full HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
title_fullStr HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
title_full_unstemmed HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
title_short HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
title_sort haan: learning a hierarchical adaptive alignment network for image-text retrieval
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10007124/
https://www.ncbi.nlm.nih.gov/pubmed/36904776
http://dx.doi.org/10.3390/s23052559
work_keys_str_mv AT wangshuhuai haanlearningahierarchicaladaptivealignmentnetworkforimagetextretrieval
AT liuzheng haanlearningahierarchicaladaptivealignmentnetworkforimagetextretrieval
AT peixinlei haanlearningahierarchicaladaptivealignmentnetworkforimagetextretrieval
AT xujunhao haanlearningahierarchicaladaptivealignmentnetworkforimagetextretrieval