Cargando…

Image to English translation and comprehension: INT2-VQA method based on inter-modality and intra-modality collaborations

Existing visual question answering methods typically concentrate only on visual targets in images, ignoring the key textual content in the images, thereby limiting the depth and accuracy of image content comprehension. Inspired by this, we pay attention to the task of text-based visual question answ...

Descripción completa

Detalles Bibliográficos
Autor principal: Sheng, Xianli
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10468077/
https://www.ncbi.nlm.nih.gov/pubmed/37647277
http://dx.doi.org/10.1371/journal.pone.0290315