Cargando…

More to diverse: Generating diversified responses in a task oriented multimodal dialog system

Multimodal dialogue system, due to its many-fold applications, has gained much attention to the researchers and developers in recent times. With the release of large-scale multimodal dialog dataset Saha et al. 2018 on the fashion domain, it has been possible to investigate the dialogue systems havin...

Descripción completa

Detalles Bibliográficos
Autores principales:	Firdaus, Mauajama, Pratap Shandeelya, Arunav, Ekbal, Asif
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7644051/ https://www.ncbi.nlm.nih.gov/pubmed/33151948 http://dx.doi.org/10.1371/journal.pone.0241271

_version_	1783606386832179200
author	Firdaus, Mauajama Pratap Shandeelya, Arunav Ekbal, Asif
author_facet	Firdaus, Mauajama Pratap Shandeelya, Arunav Ekbal, Asif
author_sort	Firdaus, Mauajama
collection	PubMed
description	Multimodal dialogue system, due to its many-fold applications, has gained much attention to the researchers and developers in recent times. With the release of large-scale multimodal dialog dataset Saha et al. 2018 on the fashion domain, it has been possible to investigate the dialogue systems having both textual and visual modalities. Response generation is an essential aspect of every dialogue system, and making the responses diverse is an important problem. For any goal-oriented conversational agent, the system’s responses must be informative, diverse and polite, that may lead to better user experiences. In this paper, we propose an end-to-end neural framework for generating varied responses in a multimodal dialogue setup capturing information from both the text and image. Multimodal encoder with co-attention between the text and image is used for focusing on the different modalities to obtain better contextual information. For effective information sharing across the modalities, we combine the information of text and images using the BLOCK fusion technique that helps in learning an improved multimodal representation. We employ stochastic beam search with Gumble Top K-tricks to achieve diversified responses while preserving the content and politeness in the responses. Experimental results show that our proposed approach performs significantly better compared to the existing and baseline methods in terms of distinct metrics, and thereby generates more diverse responses that are informative, interesting and polite without any loss of information. Empirical evaluation also reveals that images, while used along with the text, improve the efficiency of the model in generating diversified responses.
format	Online Article Text
id	pubmed-7644051
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-76440512020-11-16 More to diverse: Generating diversified responses in a task oriented multimodal dialog system Firdaus, Mauajama Pratap Shandeelya, Arunav Ekbal, Asif PLoS One Research Article Multimodal dialogue system, due to its many-fold applications, has gained much attention to the researchers and developers in recent times. With the release of large-scale multimodal dialog dataset Saha et al. 2018 on the fashion domain, it has been possible to investigate the dialogue systems having both textual and visual modalities. Response generation is an essential aspect of every dialogue system, and making the responses diverse is an important problem. For any goal-oriented conversational agent, the system’s responses must be informative, diverse and polite, that may lead to better user experiences. In this paper, we propose an end-to-end neural framework for generating varied responses in a multimodal dialogue setup capturing information from both the text and image. Multimodal encoder with co-attention between the text and image is used for focusing on the different modalities to obtain better contextual information. For effective information sharing across the modalities, we combine the information of text and images using the BLOCK fusion technique that helps in learning an improved multimodal representation. We employ stochastic beam search with Gumble Top K-tricks to achieve diversified responses while preserving the content and politeness in the responses. Experimental results show that our proposed approach performs significantly better compared to the existing and baseline methods in terms of distinct metrics, and thereby generates more diverse responses that are informative, interesting and polite without any loss of information. Empirical evaluation also reveals that images, while used along with the text, improve the efficiency of the model in generating diversified responses. Public Library of Science 2020-11-05 /pmc/articles/PMC7644051/ /pubmed/33151948 http://dx.doi.org/10.1371/journal.pone.0241271 Text en © 2020 Firdaus et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Firdaus, Mauajama Pratap Shandeelya, Arunav Ekbal, Asif More to diverse: Generating diversified responses in a task oriented multimodal dialog system
title	More to diverse: Generating diversified responses in a task oriented multimodal dialog system
title_full	More to diverse: Generating diversified responses in a task oriented multimodal dialog system
title_fullStr	More to diverse: Generating diversified responses in a task oriented multimodal dialog system
title_full_unstemmed	More to diverse: Generating diversified responses in a task oriented multimodal dialog system
title_short	More to diverse: Generating diversified responses in a task oriented multimodal dialog system
title_sort	more to diverse: generating diversified responses in a task oriented multimodal dialog system
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7644051/ https://www.ncbi.nlm.nih.gov/pubmed/33151948 http://dx.doi.org/10.1371/journal.pone.0241271
work_keys_str_mv	AT firdausmauajama moretodiversegeneratingdiversifiedresponsesinataskorientedmultimodaldialogsystem AT pratapshandeelyaarunav moretodiversegeneratingdiversifiedresponsesinataskorientedmultimodaldialogsystem AT ekbalasif moretodiversegeneratingdiversifiedresponsesinataskorientedmultimodaldialogsystem

More to diverse: Generating diversified responses in a task oriented multimodal dialog system

Ejemplares similares