Cargando…

Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration

We address and formalise the task of sequence-to-sequence (seq2seq) cross-modal retrieval. Given a sequence of text passages as query, the goal is to retrieve a sequence of images that best describes and aligns with the query. This new task extends the traditional cross-modal retrieval, where each i...

Descripción completa

Detalles Bibliográficos
Autores principales:	Batra, Vishwash, Haldar, Aparajita, He, Yulan, Ferhatosmanoglu, Hakan, Vogiatzis, George, Guha, Tanaya
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148232/ http://dx.doi.org/10.1007/978-3-030-45439-5_4

_version_	1783520549358534656
author	Batra, Vishwash Haldar, Aparajita He, Yulan Ferhatosmanoglu, Hakan Vogiatzis, George Guha, Tanaya
author_facet	Batra, Vishwash Haldar, Aparajita He, Yulan Ferhatosmanoglu, Hakan Vogiatzis, George Guha, Tanaya
author_sort	Batra, Vishwash
collection	PubMed
description	We address and formalise the task of sequence-to-sequence (seq2seq) cross-modal retrieval. Given a sequence of text passages as query, the goal is to retrieve a sequence of images that best describes and aligns with the query. This new task extends the traditional cross-modal retrieval, where each image-text pair is treated independently ignoring broader context. We propose a novel variational recurrent seq2seq (VRSS) retrieval model for this seq2seq task. Unlike most cross-modal methods, we generate an image vector corresponding to the latent topic obtained from combining the text semantics and context. This synthetic image embedding point associated with every text embedding point can then be employed for either image generation or image retrieval as desired. We evaluate the model for the application of stepwise illustration of recipes, where a sequence of relevant images are retrieved to best match the steps described in the text. To this end, we build and release a new Stepwise Recipe dataset for research purposes, containing 10K recipes (sequences of image-text pairs) having a total of 67K image-text pairs. To our knowledge, it is the first publicly available dataset to offer rich semantic descriptions in a focused category such as food or recipes. Our model is shown to outperform several competitive and relevant baselines in the experiments. We also provide qualitative analysis of how semantically meaningful the results produced by our model are through human evaluation and comparison with relevant existing methods.
format	Online Article Text
id	pubmed-7148232
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-71482322020-04-13 Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration Batra, Vishwash Haldar, Aparajita He, Yulan Ferhatosmanoglu, Hakan Vogiatzis, George Guha, Tanaya Advances in Information Retrieval Article We address and formalise the task of sequence-to-sequence (seq2seq) cross-modal retrieval. Given a sequence of text passages as query, the goal is to retrieve a sequence of images that best describes and aligns with the query. This new task extends the traditional cross-modal retrieval, where each image-text pair is treated independently ignoring broader context. We propose a novel variational recurrent seq2seq (VRSS) retrieval model for this seq2seq task. Unlike most cross-modal methods, we generate an image vector corresponding to the latent topic obtained from combining the text semantics and context. This synthetic image embedding point associated with every text embedding point can then be employed for either image generation or image retrieval as desired. We evaluate the model for the application of stepwise illustration of recipes, where a sequence of relevant images are retrieved to best match the steps described in the text. To this end, we build and release a new Stepwise Recipe dataset for research purposes, containing 10K recipes (sequences of image-text pairs) having a total of 67K image-text pairs. To our knowledge, it is the first publicly available dataset to offer rich semantic descriptions in a focused category such as food or recipes. Our model is shown to outperform several competitive and relevant baselines in the experiments. We also provide qualitative analysis of how semantically meaningful the results produced by our model are through human evaluation and comparison with relevant existing methods. 2020-03-17 /pmc/articles/PMC7148232/ http://dx.doi.org/10.1007/978-3-030-45439-5_4 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Batra, Vishwash Haldar, Aparajita He, Yulan Ferhatosmanoglu, Hakan Vogiatzis, George Guha, Tanaya Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration
title	Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration
title_full	Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration
title_fullStr	Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration
title_full_unstemmed	Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration
title_short	Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration
title_sort	variational recurrent sequence-to-sequence retrieval for stepwise illustration
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148232/ http://dx.doi.org/10.1007/978-3-030-45439-5_4
work_keys_str_mv	AT batravishwash variationalrecurrentsequencetosequenceretrievalforstepwiseillustration AT haldaraparajita variationalrecurrentsequencetosequenceretrievalforstepwiseillustration AT heyulan variationalrecurrentsequencetosequenceretrievalforstepwiseillustration AT ferhatosmanogluhakan variationalrecurrentsequencetosequenceretrievalforstepwiseillustration AT vogiatzisgeorge variationalrecurrentsequencetosequenceretrievalforstepwiseillustration AT guhatanaya variationalrecurrentsequencetosequenceretrievalforstepwiseillustration

Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration

Ejemplares similares