Cargando…
Multi-modal adaptive gated mechanism for visual question answering
Visual Question Answering (VQA) is a multimodal task that uses natural language to ask and answer questions based on image content. For multimodal tasks, obtaining accurate modality feature information is crucial. The existing researches on the visual question answering model mainly start from the p...
Autores principales: | Xu, Yangshuyi, Zhang, Lin, Shen, Xiang |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10306234/ https://www.ncbi.nlm.nih.gov/pubmed/37379280 http://dx.doi.org/10.1371/journal.pone.0287557 |
Ejemplares similares
-
The multi-modal fusion in visual question answering: a review of attention mechanisms
por: Lu, Siyu, et al.
Publicado: (2023) -
Multi-Modal Explicit Sparse Attention Networks for Visual Question Answering
por: Guo, Zihan, et al.
Publicado: (2020) -
An effective spatial relational reasoning networks for visual question answering
por: Shen, Xiang, et al.
Publicado: (2022) -
Multi-View Visual Question Answering with Active Viewpoint Selection
por: Qiu, Yue, et al.
Publicado: (2020) -
Parallel multi-head attention and term-weighted question embedding for medical visual question answering
por: Manmadhan, Sruthy, et al.
Publicado: (2023)