Cargando…
The multi-modal fusion in visual question answering: a review of attention mechanisms
Visual Question Answering (VQA) is a significant cross-disciplinary issue in the fields of computer vision and natural language processing that requires a computer to output a natural language answer based on pictures and questions posed based on the pictures. This requires simultaneous processing o...
Autores principales: | Lu, Siyu, Liu, Mingzhe, Yin, Lirong, Yin, Zhengtong, Liu, Xuan, Zheng, Wenfeng |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280591/ https://www.ncbi.nlm.nih.gov/pubmed/37346665 http://dx.doi.org/10.7717/peerj-cs.1400 |
Ejemplares similares
-
Characterization inference based on joint-optimization of multi-layer semantics and deep fusion matching network
por: Zheng, Wenfeng, et al.
Publicado: (2022) -
Recognition of multi-modal fusion images with irregular interference
por: Wang, Yawei, et al.
Publicado: (2022) -
A data-centric way to improve entity linking in knowledge-based question answering
por: Liu, Shuo, et al.
Publicado: (2023) -
A knowledge graph based question answering method for medical domain
por: Huang, Xiaofeng, et al.
Publicado: (2021) -
Deep learning-based approach for Arabic open domain question answering
por: Alsubhi, Kholoud, et al.
Publicado: (2022)