Cargando…

Visual-Text Reference Pretraining Model for Image Captioning

People can accurately describe an image by constantly referring to the visual information and key text information of the image. Inspired by this idea, we propose the VTR-PTM (Visual-Text Reference Pretraining Model) for image captioning. First, based on the pretraining model (BERT/UNIML), we design...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Pengfei, Zhang, Min, Lin, Peijie, Wan, Jian, Jiang, Ming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8799330/
https://www.ncbi.nlm.nih.gov/pubmed/35096050
http://dx.doi.org/10.1155/2022/9400999