Cargando…

Hybrid Attention Network for Language-Based Person Search

Language-based person search retrieves images of a target person using natural language description and is a challenging fine-grained cross-modal retrieval task. A novel hybrid attention network is proposed for the task. The network includes the following three aspects: First, a cubic attention mech...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Yang, Xu, Huahu, Xiao, Junsheng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7570628/ https://www.ncbi.nlm.nih.gov/pubmed/32942720 http://dx.doi.org/10.3390/s20185279

_version_	1783596990834147328
author	Li, Yang Xu, Huahu Xiao, Junsheng
author_facet	Li, Yang Xu, Huahu Xiao, Junsheng
author_sort	Li, Yang
collection	PubMed
description	Language-based person search retrieves images of a target person using natural language description and is a challenging fine-grained cross-modal retrieval task. A novel hybrid attention network is proposed for the task. The network includes the following three aspects: First, a cubic attention mechanism for person image, which combines cross-layer spatial attention and channel attention. It can fully excavate both important midlevel details and key high-level semantics to obtain better discriminative fine-grained feature representation of a person image. Second, a text attention network for language description, which is based on bidirectional LSTM (BiLSTM) and self-attention mechanism. It can better learn the bidirectional semantic dependency and capture the key words of sentences, so as to extract the context information and key semantic features of the language description more effectively and accurately. Third, a cross-modal attention mechanism and a joint loss function for cross-modal learning, which can pay more attention to the relevant parts between text and image features. It can better exploit both the cross-modal and intra-modal correlation and can better solve the problem of cross-modal heterogeneity. Extensive experiments have been conducted on the CUHK-PEDES dataset. Our approach obtains higher performance than state-of-the-art approaches, demonstrating the advantage of the approach we propose.
format	Online Article Text
id	pubmed-7570628
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-75706282020-10-28 Hybrid Attention Network for Language-Based Person Search Li, Yang Xu, Huahu Xiao, Junsheng Sensors (Basel) Article Language-based person search retrieves images of a target person using natural language description and is a challenging fine-grained cross-modal retrieval task. A novel hybrid attention network is proposed for the task. The network includes the following three aspects: First, a cubic attention mechanism for person image, which combines cross-layer spatial attention and channel attention. It can fully excavate both important midlevel details and key high-level semantics to obtain better discriminative fine-grained feature representation of a person image. Second, a text attention network for language description, which is based on bidirectional LSTM (BiLSTM) and self-attention mechanism. It can better learn the bidirectional semantic dependency and capture the key words of sentences, so as to extract the context information and key semantic features of the language description more effectively and accurately. Third, a cross-modal attention mechanism and a joint loss function for cross-modal learning, which can pay more attention to the relevant parts between text and image features. It can better exploit both the cross-modal and intra-modal correlation and can better solve the problem of cross-modal heterogeneity. Extensive experiments have been conducted on the CUHK-PEDES dataset. Our approach obtains higher performance than state-of-the-art approaches, demonstrating the advantage of the approach we propose. MDPI 2020-09-15 /pmc/articles/PMC7570628/ /pubmed/32942720 http://dx.doi.org/10.3390/s20185279 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Li, Yang Xu, Huahu Xiao, Junsheng Hybrid Attention Network for Language-Based Person Search
title	Hybrid Attention Network for Language-Based Person Search
title_full	Hybrid Attention Network for Language-Based Person Search
title_fullStr	Hybrid Attention Network for Language-Based Person Search
title_full_unstemmed	Hybrid Attention Network for Language-Based Person Search
title_short	Hybrid Attention Network for Language-Based Person Search
title_sort	hybrid attention network for language-based person search
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7570628/ https://www.ncbi.nlm.nih.gov/pubmed/32942720 http://dx.doi.org/10.3390/s20185279
work_keys_str_mv	AT liyang hybridattentionnetworkforlanguagebasedpersonsearch AT xuhuahu hybridattentionnetworkforlanguagebasedpersonsearch AT xiaojunsheng hybridattentionnetworkforlanguagebasedpersonsearch

Hybrid Attention Network for Language-Based Person Search

Ejemplares similares