Cargando…

Toward Accurate Visual Reasoning With Dual-Path Neural Module Networks

Visual reasoning is a critical stage in visual question answering (Antol et al., 2015), but most of the state-of-the-art methods categorized the VQA tasks as a classification problem without taking the reasoning process into account. Various approaches are proposed to solve this multi-modal task tha...

Descripción completa

Detalles Bibliográficos
Autores principales:	Su, Ke, Su, Hang, Li, Jianguo, Zhu, Jun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2020
Materias:	Robotics and AI
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805672/ https://www.ncbi.nlm.nih.gov/pubmed/33501276 http://dx.doi.org/10.3389/frobt.2020.00109

_version_	1783636353688272896
author	Su, Ke Su, Hang Li, Jianguo Zhu, Jun
author_facet	Su, Ke Su, Hang Li, Jianguo Zhu, Jun
author_sort	Su, Ke
collection	PubMed
description	Visual reasoning is a critical stage in visual question answering (Antol et al., 2015), but most of the state-of-the-art methods categorized the VQA tasks as a classification problem without taking the reasoning process into account. Various approaches are proposed to solve this multi-modal task that requires both abilities of comprehension and reasoning. The recently proposed neural module network (Andreas et al., 2016b), which assembles the model with a few primitive modules, is capable of performing a spatial or arithmetical reasoning over the input image to answer the questions. Nevertheless, its performance is not satisfying especially in the real-world datasets (e.g., VQA 1.0& 2.0) due to its limited primitive modules and suboptimal layout. To address these issues, we propose a novel method of Dual-Path Neural Module Network which can implement complex visual reasoning by forming a more flexible layout regularized by the pairwise loss. Specifically, we first use the region proposal network to generate both visual and spatial information, which helps it perform spatial reasoning. Then, we advocate to process a pair of different images along with the same question simultaneously, named as a “complementary pair,” which encourages the model to learn a more reasonable layout by suppressing the overfitting to the language priors. The model can jointly learn the parameters in the primitive module and the layout generation policy, which is further boosted by introducing a novel pairwise reward. Extensive experiments show that our approach significantly improves the performance of neural module networks especially on the real-world datasets.
format	Online Article Text
id	pubmed-7805672
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-78056722021-01-25 Toward Accurate Visual Reasoning With Dual-Path Neural Module Networks Su, Ke Su, Hang Li, Jianguo Zhu, Jun Front Robot AI Robotics and AI Visual reasoning is a critical stage in visual question answering (Antol et al., 2015), but most of the state-of-the-art methods categorized the VQA tasks as a classification problem without taking the reasoning process into account. Various approaches are proposed to solve this multi-modal task that requires both abilities of comprehension and reasoning. The recently proposed neural module network (Andreas et al., 2016b), which assembles the model with a few primitive modules, is capable of performing a spatial or arithmetical reasoning over the input image to answer the questions. Nevertheless, its performance is not satisfying especially in the real-world datasets (e.g., VQA 1.0& 2.0) due to its limited primitive modules and suboptimal layout. To address these issues, we propose a novel method of Dual-Path Neural Module Network which can implement complex visual reasoning by forming a more flexible layout regularized by the pairwise loss. Specifically, we first use the region proposal network to generate both visual and spatial information, which helps it perform spatial reasoning. Then, we advocate to process a pair of different images along with the same question simultaneously, named as a “complementary pair,” which encourages the model to learn a more reasonable layout by suppressing the overfitting to the language priors. The model can jointly learn the parameters in the primitive module and the layout generation policy, which is further boosted by introducing a novel pairwise reward. Extensive experiments show that our approach significantly improves the performance of neural module networks especially on the real-world datasets. Frontiers Media S.A. 2020-08-21 /pmc/articles/PMC7805672/ /pubmed/33501276 http://dx.doi.org/10.3389/frobt.2020.00109 Text en Copyright © 2020 Su, Su, Li and Zhu. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Robotics and AI Su, Ke Su, Hang Li, Jianguo Zhu, Jun Toward Accurate Visual Reasoning With Dual-Path Neural Module Networks
title	Toward Accurate Visual Reasoning With Dual-Path Neural Module Networks
title_full	Toward Accurate Visual Reasoning With Dual-Path Neural Module Networks
title_fullStr	Toward Accurate Visual Reasoning With Dual-Path Neural Module Networks
title_full_unstemmed	Toward Accurate Visual Reasoning With Dual-Path Neural Module Networks
title_short	Toward Accurate Visual Reasoning With Dual-Path Neural Module Networks
title_sort	toward accurate visual reasoning with dual-path neural module networks
topic	Robotics and AI
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805672/ https://www.ncbi.nlm.nih.gov/pubmed/33501276 http://dx.doi.org/10.3389/frobt.2020.00109
work_keys_str_mv	AT suke towardaccuratevisualreasoningwithdualpathneuralmodulenetworks AT suhang towardaccuratevisualreasoningwithdualpathneuralmodulenetworks AT lijianguo towardaccuratevisualreasoningwithdualpathneuralmodulenetworks AT zhujun towardaccuratevisualreasoningwithdualpathneuralmodulenetworks

Toward Accurate Visual Reasoning With Dual-Path Neural Module Networks

Ejemplares similares