Cargando…

HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval

Image-text retrieval aims to search related results of one modality by querying another modality. As a fundamental and key problem in cross-modal retrieval, image-text retrieval is still a challenging problem owing to the complementary and imbalanced relationship between different modalities (i.e.,...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Shuhuai, Liu, Zheng, Pei, Xinlei, Xu, Junhao
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10007124/ https://www.ncbi.nlm.nih.gov/pubmed/36904776 http://dx.doi.org/10.3390/s23052559

_version_	1784905440762527744
author	Wang, Shuhuai Liu, Zheng Pei, Xinlei Xu, Junhao
author_facet	Wang, Shuhuai Liu, Zheng Pei, Xinlei Xu, Junhao
author_sort	Wang, Shuhuai
collection	PubMed
description	Image-text retrieval aims to search related results of one modality by querying another modality. As a fundamental and key problem in cross-modal retrieval, image-text retrieval is still a challenging problem owing to the complementary and imbalanced relationship between different modalities (i.e., Image and Text) and different granularities (i.e., Global-level and Local-level). However, existing works have not fully considered how to effectively mine and fuse the complementarities between images and texts at different granularities. Therefore, in this paper, we propose a hierarchical adaptive alignment network, whose contributions are as follows: (1) We propose a multi-level alignment network, which simultaneously mines global-level and local-level data, thereby enhancing the semantic association between images and texts. (2) We propose an adaptive weighted loss to flexibly optimize the image-text similarity with two stages in a unified framework. (3) We conduct extensive experiments on three public benchmark datasets (Corel 5K, Pascal Sentence, and Wiki) and compare them with eleven state-of-the-art methods. The experimental results thoroughly verify the effectiveness of our proposed method.
format	Online Article Text
id	pubmed-10007124
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-100071242023-03-12 HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval Wang, Shuhuai Liu, Zheng Pei, Xinlei Xu, Junhao Sensors (Basel) Article Image-text retrieval aims to search related results of one modality by querying another modality. As a fundamental and key problem in cross-modal retrieval, image-text retrieval is still a challenging problem owing to the complementary and imbalanced relationship between different modalities (i.e., Image and Text) and different granularities (i.e., Global-level and Local-level). However, existing works have not fully considered how to effectively mine and fuse the complementarities between images and texts at different granularities. Therefore, in this paper, we propose a hierarchical adaptive alignment network, whose contributions are as follows: (1) We propose a multi-level alignment network, which simultaneously mines global-level and local-level data, thereby enhancing the semantic association between images and texts. (2) We propose an adaptive weighted loss to flexibly optimize the image-text similarity with two stages in a unified framework. (3) We conduct extensive experiments on three public benchmark datasets (Corel 5K, Pascal Sentence, and Wiki) and compare them with eleven state-of-the-art methods. The experimental results thoroughly verify the effectiveness of our proposed method. MDPI 2023-02-25 /pmc/articles/PMC10007124/ /pubmed/36904776 http://dx.doi.org/10.3390/s23052559 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Wang, Shuhuai Liu, Zheng Pei, Xinlei Xu, Junhao HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
title	HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
title_full	HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
title_fullStr	HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
title_full_unstemmed	HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
title_short	HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
title_sort	haan: learning a hierarchical adaptive alignment network for image-text retrieval
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10007124/ https://www.ncbi.nlm.nih.gov/pubmed/36904776 http://dx.doi.org/10.3390/s23052559
work_keys_str_mv	AT wangshuhuai haanlearningahierarchicaladaptivealignmentnetworkforimagetextretrieval AT liuzheng haanlearningahierarchicaladaptivealignmentnetworkforimagetextretrieval AT peixinlei haanlearningahierarchicaladaptivealignmentnetworkforimagetextretrieval AT xujunhao haanlearningahierarchicaladaptivealignmentnetworkforimagetextretrieval

HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval

Ejemplares similares