Cargando…

Joint embedding VQA model based on dynamic word vector

The existing joint embedding Visual Question Answering models use different combinations of image characterization, text characterization and feature fusion method, but all the existing models use static word vectors for text characterization. However, in the real language environment, the same word...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ma, Zhiyang, Zheng, Wenfeng, Chen, Xiaobing, Yin, Lirong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2021
Materias:	Human-Computer Interaction
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7959642/ https://www.ncbi.nlm.nih.gov/pubmed/33817003 http://dx.doi.org/10.7717/peerj-cs.353

_version_	1783664994263498752
author	Ma, Zhiyang Zheng, Wenfeng Chen, Xiaobing Yin, Lirong
author_facet	Ma, Zhiyang Zheng, Wenfeng Chen, Xiaobing Yin, Lirong
author_sort	Ma, Zhiyang
collection	PubMed
description	The existing joint embedding Visual Question Answering models use different combinations of image characterization, text characterization and feature fusion method, but all the existing models use static word vectors for text characterization. However, in the real language environment, the same word may represent different meanings in different contexts, and may also be used as different grammatical components. These differences cannot be effectively expressed by static word vectors, so there may be semantic and grammatical deviations. In order to solve this problem, our article constructs a joint embedding model based on dynamic word vector—none KB-Specific network (N-KBSN) model which is different from commonly used Visual Question Answering models based on static word vectors. The N-KBSN model consists of three main parts: question text and image feature extraction module, self attention and guided attention module, feature fusion and classifier module. Among them, the key parts of N-KBSN model are: image characterization based on Faster R-CNN, text characterization based on ELMo and feature enhancement based on multi-head attention mechanism. The experimental results show that the N-KBSN constructed in our experiment is better than the other 2017—winner (glove) model and 2019—winner (glove) model. The introduction of dynamic word vector improves the accuracy of the overall results.
format	Online Article Text
id	pubmed-7959642
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-79596422021-04-02 Joint embedding VQA model based on dynamic word vector Ma, Zhiyang Zheng, Wenfeng Chen, Xiaobing Yin, Lirong PeerJ Comput Sci Human-Computer Interaction The existing joint embedding Visual Question Answering models use different combinations of image characterization, text characterization and feature fusion method, but all the existing models use static word vectors for text characterization. However, in the real language environment, the same word may represent different meanings in different contexts, and may also be used as different grammatical components. These differences cannot be effectively expressed by static word vectors, so there may be semantic and grammatical deviations. In order to solve this problem, our article constructs a joint embedding model based on dynamic word vector—none KB-Specific network (N-KBSN) model which is different from commonly used Visual Question Answering models based on static word vectors. The N-KBSN model consists of three main parts: question text and image feature extraction module, self attention and guided attention module, feature fusion and classifier module. Among them, the key parts of N-KBSN model are: image characterization based on Faster R-CNN, text characterization based on ELMo and feature enhancement based on multi-head attention mechanism. The experimental results show that the N-KBSN constructed in our experiment is better than the other 2017—winner (glove) model and 2019—winner (glove) model. The introduction of dynamic word vector improves the accuracy of the overall results. PeerJ Inc. 2021-03-03 /pmc/articles/PMC7959642/ /pubmed/33817003 http://dx.doi.org/10.7717/peerj-cs.353 Text en © 2021 Ma et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Human-Computer Interaction Ma, Zhiyang Zheng, Wenfeng Chen, Xiaobing Yin, Lirong Joint embedding VQA model based on dynamic word vector
title	Joint embedding VQA model based on dynamic word vector
title_full	Joint embedding VQA model based on dynamic word vector
title_fullStr	Joint embedding VQA model based on dynamic word vector
title_full_unstemmed	Joint embedding VQA model based on dynamic word vector
title_short	Joint embedding VQA model based on dynamic word vector
title_sort	joint embedding vqa model based on dynamic word vector
topic	Human-Computer Interaction
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7959642/ https://www.ncbi.nlm.nih.gov/pubmed/33817003 http://dx.doi.org/10.7717/peerj-cs.353
work_keys_str_mv	AT mazhiyang jointembeddingvqamodelbasedondynamicwordvector AT zhengwenfeng jointembeddingvqamodelbasedondynamicwordvector AT chenxiaobing jointembeddingvqamodelbasedondynamicwordvector AT yinlirong jointembeddingvqamodelbasedondynamicwordvector

Joint embedding VQA model based on dynamic word vector

Ejemplares similares