Cargando…
NMN-VD: A Neural Module Network for Visual Dialog
Visual dialog demonstrates several important aspects of multimodal artificial intelligence; however, it is hindered by visual grounding and visual coreference resolution problems. To overcome these problems, we propose the novel neural module network for visual dialog (NMN-VD). NMN-VD is an efficien...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7866498/ https://www.ncbi.nlm.nih.gov/pubmed/33573265 http://dx.doi.org/10.3390/s21030931 |
_version_ | 1783648089114935296 |
---|---|
author | Cho, Yeongsu Kim, Incheol |
author_facet | Cho, Yeongsu Kim, Incheol |
author_sort | Cho, Yeongsu |
collection | PubMed |
description | Visual dialog demonstrates several important aspects of multimodal artificial intelligence; however, it is hindered by visual grounding and visual coreference resolution problems. To overcome these problems, we propose the novel neural module network for visual dialog (NMN-VD). NMN-VD is an efficient question-customized modular network model that combines only the modules required for deciding answers after analyzing input questions. In particular, the model includes a Refer module that effectively finds the visual area indicated by a pronoun using a reference pool to solve a visual coreference resolution problem, which is an important challenge in visual dialog. In addition, the proposed NMN-VD model includes a method for distinguishing and handling impersonal pronouns that do not require visual coreference resolution from general pronouns. Furthermore, a new Compare module that effectively handles comparison questions found in visual dialogs is included in the model, as well as a Find module that applies a triple-attention mechanism to solve visual grounding problems between the question and the image. The results of various experiments conducted using a set of large-scale benchmark data verify the efficacy and high performance of our proposed NMN-VD model. |
format | Online Article Text |
id | pubmed-7866498 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-78664982021-02-07 NMN-VD: A Neural Module Network for Visual Dialog Cho, Yeongsu Kim, Incheol Sensors (Basel) Article Visual dialog demonstrates several important aspects of multimodal artificial intelligence; however, it is hindered by visual grounding and visual coreference resolution problems. To overcome these problems, we propose the novel neural module network for visual dialog (NMN-VD). NMN-VD is an efficient question-customized modular network model that combines only the modules required for deciding answers after analyzing input questions. In particular, the model includes a Refer module that effectively finds the visual area indicated by a pronoun using a reference pool to solve a visual coreference resolution problem, which is an important challenge in visual dialog. In addition, the proposed NMN-VD model includes a method for distinguishing and handling impersonal pronouns that do not require visual coreference resolution from general pronouns. Furthermore, a new Compare module that effectively handles comparison questions found in visual dialogs is included in the model, as well as a Find module that applies a triple-attention mechanism to solve visual grounding problems between the question and the image. The results of various experiments conducted using a set of large-scale benchmark data verify the efficacy and high performance of our proposed NMN-VD model. MDPI 2021-01-30 /pmc/articles/PMC7866498/ /pubmed/33573265 http://dx.doi.org/10.3390/s21030931 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Cho, Yeongsu Kim, Incheol NMN-VD: A Neural Module Network for Visual Dialog |
title | NMN-VD: A Neural Module Network for Visual Dialog |
title_full | NMN-VD: A Neural Module Network for Visual Dialog |
title_fullStr | NMN-VD: A Neural Module Network for Visual Dialog |
title_full_unstemmed | NMN-VD: A Neural Module Network for Visual Dialog |
title_short | NMN-VD: A Neural Module Network for Visual Dialog |
title_sort | nmn-vd: a neural module network for visual dialog |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7866498/ https://www.ncbi.nlm.nih.gov/pubmed/33573265 http://dx.doi.org/10.3390/s21030931 |
work_keys_str_mv | AT choyeongsu nmnvdaneuralmodulenetworkforvisualdialog AT kimincheol nmnvdaneuralmodulenetworkforvisualdialog |