Cargando…

Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering

Collaborative reasoning for knowledge-based visual question answering is challenging but vital and efficient in understanding the features of the images and questions. While previous methods jointly fuse all kinds of features by attention mechanism or use handcrafted rules to generate a layout for p...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Qifeng, Tang, Xinyi, Jian, Yi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8874875/ https://www.ncbi.nlm.nih.gov/pubmed/35214484 http://dx.doi.org/10.3390/s22041575

_version_	1784657792765788160
author	Li, Qifeng Tang, Xinyi Jian, Yi
author_facet	Li, Qifeng Tang, Xinyi Jian, Yi
author_sort	Li, Qifeng
collection	PubMed
description	Collaborative reasoning for knowledge-based visual question answering is challenging but vital and efficient in understanding the features of the images and questions. While previous methods jointly fuse all kinds of features by attention mechanism or use handcrafted rules to generate a layout for performing compositional reasoning, which lacks the process of visual reasoning and introduces a large number of parameters for predicting the correct answer. For conducting visual reasoning on all kinds of image–question pairs, in this paper, we propose a novel reasoning model of a question-guided tree structure with a knowledge base (QGTSKB) for addressing these problems. In addition, our model consists of four neural module networks: the attention model that locates attended regions based on the image features and question embeddings by attention mechanism, the gated reasoning model that forgets and updates the fused features, the fusion reasoning model that mines high-level semantics of the attended visual features and knowledge base and knowledge-based fact model that makes up for the lack of visual and textual information with external knowledge. Therefore, our model performs visual analysis and reasoning based on tree structures, knowledge base and four neural module networks. Experimental results show that our model achieves superior performance over existing methods on the VQA v2.0 and CLVER dataset, and visual reasoning experiments prove the interpretability of the model.
format	Online Article Text
id	pubmed-8874875
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-88748752022-02-26 Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering Li, Qifeng Tang, Xinyi Jian, Yi Sensors (Basel) Article Collaborative reasoning for knowledge-based visual question answering is challenging but vital and efficient in understanding the features of the images and questions. While previous methods jointly fuse all kinds of features by attention mechanism or use handcrafted rules to generate a layout for performing compositional reasoning, which lacks the process of visual reasoning and introduces a large number of parameters for predicting the correct answer. For conducting visual reasoning on all kinds of image–question pairs, in this paper, we propose a novel reasoning model of a question-guided tree structure with a knowledge base (QGTSKB) for addressing these problems. In addition, our model consists of four neural module networks: the attention model that locates attended regions based on the image features and question embeddings by attention mechanism, the gated reasoning model that forgets and updates the fused features, the fusion reasoning model that mines high-level semantics of the attended visual features and knowledge base and knowledge-based fact model that makes up for the lack of visual and textual information with external knowledge. Therefore, our model performs visual analysis and reasoning based on tree structures, knowledge base and four neural module networks. Experimental results show that our model achieves superior performance over existing methods on the VQA v2.0 and CLVER dataset, and visual reasoning experiments prove the interpretability of the model. MDPI 2022-02-17 /pmc/articles/PMC8874875/ /pubmed/35214484 http://dx.doi.org/10.3390/s22041575 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Li, Qifeng Tang, Xinyi Jian, Yi Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering
title	Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering
title_full	Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering
title_fullStr	Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering
title_full_unstemmed	Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering
title_short	Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering
title_sort	learning to reason on tree structures for knowledge-based visual question answering
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8874875/ https://www.ncbi.nlm.nih.gov/pubmed/35214484 http://dx.doi.org/10.3390/s22041575
work_keys_str_mv	AT liqifeng learningtoreasonontreestructuresforknowledgebasedvisualquestionanswering AT tangxinyi learningtoreasonontreestructuresforknowledgebasedvisualquestionanswering AT jianyi learningtoreasonontreestructuresforknowledgebasedvisualquestionanswering

Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering

Ejemplares similares