Cargando…
Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering
Collaborative reasoning for knowledge-based visual question answering is challenging but vital and efficient in understanding the features of the images and questions. While previous methods jointly fuse all kinds of features by attention mechanism or use handcrafted rules to generate a layout for p...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8874875/ https://www.ncbi.nlm.nih.gov/pubmed/35214484 http://dx.doi.org/10.3390/s22041575 |
_version_ | 1784657792765788160 |
---|---|
author | Li, Qifeng Tang, Xinyi Jian, Yi |
author_facet | Li, Qifeng Tang, Xinyi Jian, Yi |
author_sort | Li, Qifeng |
collection | PubMed |
description | Collaborative reasoning for knowledge-based visual question answering is challenging but vital and efficient in understanding the features of the images and questions. While previous methods jointly fuse all kinds of features by attention mechanism or use handcrafted rules to generate a layout for performing compositional reasoning, which lacks the process of visual reasoning and introduces a large number of parameters for predicting the correct answer. For conducting visual reasoning on all kinds of image–question pairs, in this paper, we propose a novel reasoning model of a question-guided tree structure with a knowledge base (QGTSKB) for addressing these problems. In addition, our model consists of four neural module networks: the attention model that locates attended regions based on the image features and question embeddings by attention mechanism, the gated reasoning model that forgets and updates the fused features, the fusion reasoning model that mines high-level semantics of the attended visual features and knowledge base and knowledge-based fact model that makes up for the lack of visual and textual information with external knowledge. Therefore, our model performs visual analysis and reasoning based on tree structures, knowledge base and four neural module networks. Experimental results show that our model achieves superior performance over existing methods on the VQA v2.0 and CLVER dataset, and visual reasoning experiments prove the interpretability of the model. |
format | Online Article Text |
id | pubmed-8874875 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-88748752022-02-26 Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering Li, Qifeng Tang, Xinyi Jian, Yi Sensors (Basel) Article Collaborative reasoning for knowledge-based visual question answering is challenging but vital and efficient in understanding the features of the images and questions. While previous methods jointly fuse all kinds of features by attention mechanism or use handcrafted rules to generate a layout for performing compositional reasoning, which lacks the process of visual reasoning and introduces a large number of parameters for predicting the correct answer. For conducting visual reasoning on all kinds of image–question pairs, in this paper, we propose a novel reasoning model of a question-guided tree structure with a knowledge base (QGTSKB) for addressing these problems. In addition, our model consists of four neural module networks: the attention model that locates attended regions based on the image features and question embeddings by attention mechanism, the gated reasoning model that forgets and updates the fused features, the fusion reasoning model that mines high-level semantics of the attended visual features and knowledge base and knowledge-based fact model that makes up for the lack of visual and textual information with external knowledge. Therefore, our model performs visual analysis and reasoning based on tree structures, knowledge base and four neural module networks. Experimental results show that our model achieves superior performance over existing methods on the VQA v2.0 and CLVER dataset, and visual reasoning experiments prove the interpretability of the model. MDPI 2022-02-17 /pmc/articles/PMC8874875/ /pubmed/35214484 http://dx.doi.org/10.3390/s22041575 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Li, Qifeng Tang, Xinyi Jian, Yi Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering |
title | Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering |
title_full | Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering |
title_fullStr | Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering |
title_full_unstemmed | Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering |
title_short | Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering |
title_sort | learning to reason on tree structures for knowledge-based visual question answering |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8874875/ https://www.ncbi.nlm.nih.gov/pubmed/35214484 http://dx.doi.org/10.3390/s22041575 |
work_keys_str_mv | AT liqifeng learningtoreasonontreestructuresforknowledgebasedvisualquestionanswering AT tangxinyi learningtoreasonontreestructuresforknowledgebasedvisualquestionanswering AT jianyi learningtoreasonontreestructuresforknowledgebasedvisualquestionanswering |