Cargando…

Visual-Text Reference Pretraining Model for Image Captioning

People can accurately describe an image by constantly referring to the visual information and key text information of the image. Inspired by this idea, we propose the VTR-PTM (Visual-Text Reference Pretraining Model) for image captioning. First, based on the pretraining model (BERT/UNIML), we design...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Pengfei, Zhang, Min, Lin, Peijie, Wan, Jian, Jiang, Ming
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8799330/ https://www.ncbi.nlm.nih.gov/pubmed/35096050 http://dx.doi.org/10.1155/2022/9400999

Internet

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8799330/
https://www.ncbi.nlm.nih.gov/pubmed/35096050
http://dx.doi.org/10.1155/2022/9400999

Visual-Text Reference Pretraining Model for Image Captioning

Internet

Ejemplares similares