Cargando…

Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates

Research related to fashion and e-commerce domains is gaining attention in computer vision and multimedia communities. Following this trend, this article tackles the task of generating fine-grained and accurate natural language descriptions of fashion items, a recently-proposed and under-explored ch...

Descripción completa

Detalles Bibliográficos
Autores principales:	Moratelli, Nicholas, Barraco, Manuele, Morelli, Davide, Cornia, Marcella, Baraldi, Lorenzo, Cucchiara, Rita
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9921965/ https://www.ncbi.nlm.nih.gov/pubmed/36772326 http://dx.doi.org/10.3390/s23031286

Descripción
Sumario:	Research related to fashion and e-commerce domains is gaining attention in computer vision and multimedia communities. Following this trend, this article tackles the task of generating fine-grained and accurate natural language descriptions of fashion items, a recently-proposed and under-explored challenge that is still far from being solved. To overcome the limitations of previous approaches, a transformer-based captioning model was designed with the integration of external textual memory that could be accessed through k-nearest neighbor (kNN) searches. From an architectural point of view, the proposed transformer model can read and retrieve items from the external memory through cross-attention operations, and tune the flow of information coming from the external memory thanks to a novel fully attentive gate. Experimental analyses were carried out on the fashion captioning dataset (FACAD) for fashion image captioning, which contains more than 130k fine-grained descriptions, validating the effectiveness of the proposed approach and the proposed architectural strategies in comparison with carefully designed baselines and state-of-the-art approaches. The presented method constantly outperforms all compared approaches, demonstrating its effectiveness for fashion image captioning.

Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates

Ejemplares similares