Cargando…

Multi-View Visual Question Answering with Active Viewpoint Selection

This paper proposes a framework that allows the observation of a scene iteratively to answer a given question about the scene. Conventional visual question answering (VQA) methods are designed to answer given questions based on single-view images. However, in real-world applications, such as human–r...

Descripción completa

Detalles Bibliográficos
Autores principales:	Qiu, Yue, Satoh, Yutaka, Suzuki, Ryota, Iwata, Kenji, Kataoka, Hirokatsu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7219048/ https://www.ncbi.nlm.nih.gov/pubmed/32316433 http://dx.doi.org/10.3390/s20082281

_version_	1783532917960474624
author	Qiu, Yue Satoh, Yutaka Suzuki, Ryota Iwata, Kenji Kataoka, Hirokatsu
author_facet	Qiu, Yue Satoh, Yutaka Suzuki, Ryota Iwata, Kenji Kataoka, Hirokatsu
author_sort	Qiu, Yue
collection	PubMed
description	This paper proposes a framework that allows the observation of a scene iteratively to answer a given question about the scene. Conventional visual question answering (VQA) methods are designed to answer given questions based on single-view images. However, in real-world applications, such as human–robot interaction (HRI), in which camera angles and occluded scenes must be considered, answering questions based on single-view images might be difficult. Since HRI applications make it possible to observe a scene from multiple viewpoints, it is reasonable to discuss the VQA task in multi-view settings. In addition, because it is usually challenging to observe a scene from arbitrary viewpoints, we designed a framework that allows the observation of a scene actively until the necessary scene information to answer a given question is obtained. The proposed framework achieves comparable performance to a state-of-the-art method in question answering and simultaneously decreases the number of required observation viewpoints by a significant margin. Additionally, we found our framework plausibly learned to choose better viewpoints for answering questions, lowering the required number of camera movements. Moreover, we built a multi-view VQA dataset based on real images. The proposed framework shows high accuracy (94.01%) for the unseen real image dataset.
format	Online Article Text
id	pubmed-7219048
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-72190482020-05-22 Multi-View Visual Question Answering with Active Viewpoint Selection Qiu, Yue Satoh, Yutaka Suzuki, Ryota Iwata, Kenji Kataoka, Hirokatsu Sensors (Basel) Article This paper proposes a framework that allows the observation of a scene iteratively to answer a given question about the scene. Conventional visual question answering (VQA) methods are designed to answer given questions based on single-view images. However, in real-world applications, such as human–robot interaction (HRI), in which camera angles and occluded scenes must be considered, answering questions based on single-view images might be difficult. Since HRI applications make it possible to observe a scene from multiple viewpoints, it is reasonable to discuss the VQA task in multi-view settings. In addition, because it is usually challenging to observe a scene from arbitrary viewpoints, we designed a framework that allows the observation of a scene actively until the necessary scene information to answer a given question is obtained. The proposed framework achieves comparable performance to a state-of-the-art method in question answering and simultaneously decreases the number of required observation viewpoints by a significant margin. Additionally, we found our framework plausibly learned to choose better viewpoints for answering questions, lowering the required number of camera movements. Moreover, we built a multi-view VQA dataset based on real images. The proposed framework shows high accuracy (94.01%) for the unseen real image dataset. MDPI 2020-04-17 /pmc/articles/PMC7219048/ /pubmed/32316433 http://dx.doi.org/10.3390/s20082281 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Qiu, Yue Satoh, Yutaka Suzuki, Ryota Iwata, Kenji Kataoka, Hirokatsu Multi-View Visual Question Answering with Active Viewpoint Selection
title	Multi-View Visual Question Answering with Active Viewpoint Selection
title_full	Multi-View Visual Question Answering with Active Viewpoint Selection
title_fullStr	Multi-View Visual Question Answering with Active Viewpoint Selection
title_full_unstemmed	Multi-View Visual Question Answering with Active Viewpoint Selection
title_short	Multi-View Visual Question Answering with Active Viewpoint Selection
title_sort	multi-view visual question answering with active viewpoint selection
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7219048/ https://www.ncbi.nlm.nih.gov/pubmed/32316433 http://dx.doi.org/10.3390/s20082281
work_keys_str_mv	AT qiuyue multiviewvisualquestionansweringwithactiveviewpointselection AT satohyutaka multiviewvisualquestionansweringwithactiveviewpointselection AT suzukiryota multiviewvisualquestionansweringwithactiveviewpointselection AT iwatakenji multiviewvisualquestionansweringwithactiveviewpointselection AT kataokahirokatsu multiviewvisualquestionansweringwithactiveviewpointselection

Multi-View Visual Question Answering with Active Viewpoint Selection

Ejemplares similares