Cargando…
Multi-Modal Explicit Sparse Attention Networks for Visual Question Answering
Visual question answering (VQA) is a multi-modal task involving natural language processing (NLP) and computer vision (CV), which requires models to understand of both visual information and textual information simultaneously to predict the correct answer for the input visual image and textual quest...
Autores principales: | Guo, Zihan, Han, Dezhi |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7730290/ https://www.ncbi.nlm.nih.gov/pubmed/33255994 http://dx.doi.org/10.3390/s20236758 |
Ejemplares similares
-
An Effective Dense Co-Attention Networks for Visual Question Answering
por: He, Shirong, et al.
Publicado: (2020) -
The multi-modal fusion in visual question answering: a review of attention mechanisms
por: Lu, Siyu, et al.
Publicado: (2023) -
Multi-modal adaptive gated mechanism for visual question answering
por: Xu, Yangshuyi, et al.
Publicado: (2023) -
An effective spatial relational reasoning networks for visual question answering
por: Shen, Xiang, et al.
Publicado: (2022) -
Deep Modular Bilinear Attention Network for Visual Question Answering
por: Yan, Feng, et al.
Publicado: (2022)