Cargando…

NMN-VD: A Neural Module Network for Visual Dialog

Visual dialog demonstrates several important aspects of multimodal artificial intelligence; however, it is hindered by visual grounding and visual coreference resolution problems. To overcome these problems, we propose the novel neural module network for visual dialog (NMN-VD). NMN-VD is an efficien...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cho, Yeongsu, Kim, Incheol
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7866498/ https://www.ncbi.nlm.nih.gov/pubmed/33573265 http://dx.doi.org/10.3390/s21030931

_version_	1783648089114935296
author	Cho, Yeongsu Kim, Incheol
author_facet	Cho, Yeongsu Kim, Incheol
author_sort	Cho, Yeongsu
collection	PubMed
description	Visual dialog demonstrates several important aspects of multimodal artificial intelligence; however, it is hindered by visual grounding and visual coreference resolution problems. To overcome these problems, we propose the novel neural module network for visual dialog (NMN-VD). NMN-VD is an efficient question-customized modular network model that combines only the modules required for deciding answers after analyzing input questions. In particular, the model includes a Refer module that effectively finds the visual area indicated by a pronoun using a reference pool to solve a visual coreference resolution problem, which is an important challenge in visual dialog. In addition, the proposed NMN-VD model includes a method for distinguishing and handling impersonal pronouns that do not require visual coreference resolution from general pronouns. Furthermore, a new Compare module that effectively handles comparison questions found in visual dialogs is included in the model, as well as a Find module that applies a triple-attention mechanism to solve visual grounding problems between the question and the image. The results of various experiments conducted using a set of large-scale benchmark data verify the efficacy and high performance of our proposed NMN-VD model.
format	Online Article Text
id	pubmed-7866498
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-78664982021-02-07 NMN-VD: A Neural Module Network for Visual Dialog Cho, Yeongsu Kim, Incheol Sensors (Basel) Article Visual dialog demonstrates several important aspects of multimodal artificial intelligence; however, it is hindered by visual grounding and visual coreference resolution problems. To overcome these problems, we propose the novel neural module network for visual dialog (NMN-VD). NMN-VD is an efficient question-customized modular network model that combines only the modules required for deciding answers after analyzing input questions. In particular, the model includes a Refer module that effectively finds the visual area indicated by a pronoun using a reference pool to solve a visual coreference resolution problem, which is an important challenge in visual dialog. In addition, the proposed NMN-VD model includes a method for distinguishing and handling impersonal pronouns that do not require visual coreference resolution from general pronouns. Furthermore, a new Compare module that effectively handles comparison questions found in visual dialogs is included in the model, as well as a Find module that applies a triple-attention mechanism to solve visual grounding problems between the question and the image. The results of various experiments conducted using a set of large-scale benchmark data verify the efficacy and high performance of our proposed NMN-VD model. MDPI 2021-01-30 /pmc/articles/PMC7866498/ /pubmed/33573265 http://dx.doi.org/10.3390/s21030931 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Cho, Yeongsu Kim, Incheol NMN-VD: A Neural Module Network for Visual Dialog
title	NMN-VD: A Neural Module Network for Visual Dialog
title_full	NMN-VD: A Neural Module Network for Visual Dialog
title_fullStr	NMN-VD: A Neural Module Network for Visual Dialog
title_full_unstemmed	NMN-VD: A Neural Module Network for Visual Dialog
title_short	NMN-VD: A Neural Module Network for Visual Dialog
title_sort	nmn-vd: a neural module network for visual dialog
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7866498/ https://www.ncbi.nlm.nih.gov/pubmed/33573265 http://dx.doi.org/10.3390/s21030931
work_keys_str_mv	AT choyeongsu nmnvdaneuralmodulenetworkforvisualdialog AT kimincheol nmnvdaneuralmodulenetworkforvisualdialog

NMN-VD: A Neural Module Network for Visual Dialog

Ejemplares similares