Cargando…

Learning to Rank Images with Cross-Modal Graph Convolutions

We are interested in the problem of cross-modal retrieval for web image search, where the goal is to retrieve images relevant to a text query. While most of the current approaches for cross-modal retrieval revolve around learning how to represent text and images in a shared latent space, we take a d...

Descripción completa

Detalles Bibliográficos
Autores principales:	Formal, Thibault, Clinchant, Stéphane, Renders, Jean-Michel, Lee, Sooyeol, Cho, Geun Hee
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148208/ http://dx.doi.org/10.1007/978-3-030-45439-5_39

_version_	1783520543697272832
author	Formal, Thibault Clinchant, Stéphane Renders, Jean-Michel Lee, Sooyeol Cho, Geun Hee
author_facet	Formal, Thibault Clinchant, Stéphane Renders, Jean-Michel Lee, Sooyeol Cho, Geun Hee
author_sort	Formal, Thibault
collection	PubMed
description	We are interested in the problem of cross-modal retrieval for web image search, where the goal is to retrieve images relevant to a text query. While most of the current approaches for cross-modal retrieval revolve around learning how to represent text and images in a shared latent space, we take a different direction: we propose to generalize the cross-modal relevance feedback mechanism, a simple yet effective unsupervised method, that relies on standard information retrieval heuristics and the choice of a few hyper-parameters. We show that we can cast it as a supervised representation learning problem on graphs, using graph convolutions operating jointly over text and image features, namely cross-modal graph convolutions. The proposed architecture directly learns how to combine image and text features for the ranking task, while taking into account the context given by all the other elements in the set of images to be (re-)ranked. We validate our approach on two datasets: a public dataset from a MediaEval challenge, and a small sample of proprietary image search query logs, referred as WebQ. Our experiments demonstrate that our model improves over standard baselines.
format	Online Article Text
id	pubmed-7148208
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-71482082020-04-13 Learning to Rank Images with Cross-Modal Graph Convolutions Formal, Thibault Clinchant, Stéphane Renders, Jean-Michel Lee, Sooyeol Cho, Geun Hee Advances in Information Retrieval Article We are interested in the problem of cross-modal retrieval for web image search, where the goal is to retrieve images relevant to a text query. While most of the current approaches for cross-modal retrieval revolve around learning how to represent text and images in a shared latent space, we take a different direction: we propose to generalize the cross-modal relevance feedback mechanism, a simple yet effective unsupervised method, that relies on standard information retrieval heuristics and the choice of a few hyper-parameters. We show that we can cast it as a supervised representation learning problem on graphs, using graph convolutions operating jointly over text and image features, namely cross-modal graph convolutions. The proposed architecture directly learns how to combine image and text features for the ranking task, while taking into account the context given by all the other elements in the set of images to be (re-)ranked. We validate our approach on two datasets: a public dataset from a MediaEval challenge, and a small sample of proprietary image search query logs, referred as WebQ. Our experiments demonstrate that our model improves over standard baselines. 2020-03-17 /pmc/articles/PMC7148208/ http://dx.doi.org/10.1007/978-3-030-45439-5_39 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Formal, Thibault Clinchant, Stéphane Renders, Jean-Michel Lee, Sooyeol Cho, Geun Hee Learning to Rank Images with Cross-Modal Graph Convolutions
title	Learning to Rank Images with Cross-Modal Graph Convolutions
title_full	Learning to Rank Images with Cross-Modal Graph Convolutions
title_fullStr	Learning to Rank Images with Cross-Modal Graph Convolutions
title_full_unstemmed	Learning to Rank Images with Cross-Modal Graph Convolutions
title_short	Learning to Rank Images with Cross-Modal Graph Convolutions
title_sort	learning to rank images with cross-modal graph convolutions
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148208/ http://dx.doi.org/10.1007/978-3-030-45439-5_39
work_keys_str_mv	AT formalthibault learningtorankimageswithcrossmodalgraphconvolutions AT clinchantstephane learningtorankimageswithcrossmodalgraphconvolutions AT rendersjeanmichel learningtorankimageswithcrossmodalgraphconvolutions AT leesooyeol learningtorankimageswithcrossmodalgraphconvolutions AT chogeunhee learningtorankimageswithcrossmodalgraphconvolutions

Learning to Rank Images with Cross-Modal Graph Convolutions

Ejemplares similares