Cargando…

NMN-VD: A Neural Module Network for Visual Dialog

Visual dialog demonstrates several important aspects of multimodal artificial intelligence; however, it is hindered by visual grounding and visual coreference resolution problems. To overcome these problems, we propose the novel neural module network for visual dialog (NMN-VD). NMN-VD is an efficien...

Descripción completa

Detalles Bibliográficos
Autores principales: Cho, Yeongsu, Kim, Incheol
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7866498/
https://www.ncbi.nlm.nih.gov/pubmed/33573265
http://dx.doi.org/10.3390/s21030931
_version_ 1783648089114935296
author Cho, Yeongsu
Kim, Incheol
author_facet Cho, Yeongsu
Kim, Incheol
author_sort Cho, Yeongsu
collection PubMed
description Visual dialog demonstrates several important aspects of multimodal artificial intelligence; however, it is hindered by visual grounding and visual coreference resolution problems. To overcome these problems, we propose the novel neural module network for visual dialog (NMN-VD). NMN-VD is an efficient question-customized modular network model that combines only the modules required for deciding answers after analyzing input questions. In particular, the model includes a Refer module that effectively finds the visual area indicated by a pronoun using a reference pool to solve a visual coreference resolution problem, which is an important challenge in visual dialog. In addition, the proposed NMN-VD model includes a method for distinguishing and handling impersonal pronouns that do not require visual coreference resolution from general pronouns. Furthermore, a new Compare module that effectively handles comparison questions found in visual dialogs is included in the model, as well as a Find module that applies a triple-attention mechanism to solve visual grounding problems between the question and the image. The results of various experiments conducted using a set of large-scale benchmark data verify the efficacy and high performance of our proposed NMN-VD model.
format Online
Article
Text
id pubmed-7866498
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-78664982021-02-07 NMN-VD: A Neural Module Network for Visual Dialog Cho, Yeongsu Kim, Incheol Sensors (Basel) Article Visual dialog demonstrates several important aspects of multimodal artificial intelligence; however, it is hindered by visual grounding and visual coreference resolution problems. To overcome these problems, we propose the novel neural module network for visual dialog (NMN-VD). NMN-VD is an efficient question-customized modular network model that combines only the modules required for deciding answers after analyzing input questions. In particular, the model includes a Refer module that effectively finds the visual area indicated by a pronoun using a reference pool to solve a visual coreference resolution problem, which is an important challenge in visual dialog. In addition, the proposed NMN-VD model includes a method for distinguishing and handling impersonal pronouns that do not require visual coreference resolution from general pronouns. Furthermore, a new Compare module that effectively handles comparison questions found in visual dialogs is included in the model, as well as a Find module that applies a triple-attention mechanism to solve visual grounding problems between the question and the image. The results of various experiments conducted using a set of large-scale benchmark data verify the efficacy and high performance of our proposed NMN-VD model. MDPI 2021-01-30 /pmc/articles/PMC7866498/ /pubmed/33573265 http://dx.doi.org/10.3390/s21030931 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Cho, Yeongsu
Kim, Incheol
NMN-VD: A Neural Module Network for Visual Dialog
title NMN-VD: A Neural Module Network for Visual Dialog
title_full NMN-VD: A Neural Module Network for Visual Dialog
title_fullStr NMN-VD: A Neural Module Network for Visual Dialog
title_full_unstemmed NMN-VD: A Neural Module Network for Visual Dialog
title_short NMN-VD: A Neural Module Network for Visual Dialog
title_sort nmn-vd: a neural module network for visual dialog
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7866498/
https://www.ncbi.nlm.nih.gov/pubmed/33573265
http://dx.doi.org/10.3390/s21030931
work_keys_str_mv AT choyeongsu nmnvdaneuralmodulenetworkforvisualdialog
AT kimincheol nmnvdaneuralmodulenetworkforvisualdialog