Cargando…

Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering

Collaborative reasoning for knowledge-based visual question answering is challenging but vital and efficient in understanding the features of the images and questions. While previous methods jointly fuse all kinds of features by attention mechanism or use handcrafted rules to generate a layout for p...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Qifeng, Tang, Xinyi, Jian, Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8874875/
https://www.ncbi.nlm.nih.gov/pubmed/35214484
http://dx.doi.org/10.3390/s22041575
_version_ 1784657792765788160
author Li, Qifeng
Tang, Xinyi
Jian, Yi
author_facet Li, Qifeng
Tang, Xinyi
Jian, Yi
author_sort Li, Qifeng
collection PubMed
description Collaborative reasoning for knowledge-based visual question answering is challenging but vital and efficient in understanding the features of the images and questions. While previous methods jointly fuse all kinds of features by attention mechanism or use handcrafted rules to generate a layout for performing compositional reasoning, which lacks the process of visual reasoning and introduces a large number of parameters for predicting the correct answer. For conducting visual reasoning on all kinds of image–question pairs, in this paper, we propose a novel reasoning model of a question-guided tree structure with a knowledge base (QGTSKB) for addressing these problems. In addition, our model consists of four neural module networks: the attention model that locates attended regions based on the image features and question embeddings by attention mechanism, the gated reasoning model that forgets and updates the fused features, the fusion reasoning model that mines high-level semantics of the attended visual features and knowledge base and knowledge-based fact model that makes up for the lack of visual and textual information with external knowledge. Therefore, our model performs visual analysis and reasoning based on tree structures, knowledge base and four neural module networks. Experimental results show that our model achieves superior performance over existing methods on the VQA v2.0 and CLVER dataset, and visual reasoning experiments prove the interpretability of the model.
format Online
Article
Text
id pubmed-8874875
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-88748752022-02-26 Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering Li, Qifeng Tang, Xinyi Jian, Yi Sensors (Basel) Article Collaborative reasoning for knowledge-based visual question answering is challenging but vital and efficient in understanding the features of the images and questions. While previous methods jointly fuse all kinds of features by attention mechanism or use handcrafted rules to generate a layout for performing compositional reasoning, which lacks the process of visual reasoning and introduces a large number of parameters for predicting the correct answer. For conducting visual reasoning on all kinds of image–question pairs, in this paper, we propose a novel reasoning model of a question-guided tree structure with a knowledge base (QGTSKB) for addressing these problems. In addition, our model consists of four neural module networks: the attention model that locates attended regions based on the image features and question embeddings by attention mechanism, the gated reasoning model that forgets and updates the fused features, the fusion reasoning model that mines high-level semantics of the attended visual features and knowledge base and knowledge-based fact model that makes up for the lack of visual and textual information with external knowledge. Therefore, our model performs visual analysis and reasoning based on tree structures, knowledge base and four neural module networks. Experimental results show that our model achieves superior performance over existing methods on the VQA v2.0 and CLVER dataset, and visual reasoning experiments prove the interpretability of the model. MDPI 2022-02-17 /pmc/articles/PMC8874875/ /pubmed/35214484 http://dx.doi.org/10.3390/s22041575 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Li, Qifeng
Tang, Xinyi
Jian, Yi
Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering
title Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering
title_full Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering
title_fullStr Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering
title_full_unstemmed Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering
title_short Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering
title_sort learning to reason on tree structures for knowledge-based visual question answering
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8874875/
https://www.ncbi.nlm.nih.gov/pubmed/35214484
http://dx.doi.org/10.3390/s22041575
work_keys_str_mv AT liqifeng learningtoreasonontreestructuresforknowledgebasedvisualquestionanswering
AT tangxinyi learningtoreasonontreestructuresforknowledgebasedvisualquestionanswering
AT jianyi learningtoreasonontreestructuresforknowledgebasedvisualquestionanswering