Cargando…

Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates

Research related to fashion and e-commerce domains is gaining attention in computer vision and multimedia communities. Following this trend, this article tackles the task of generating fine-grained and accurate natural language descriptions of fashion items, a recently-proposed and under-explored ch...

Descripción completa

Detalles Bibliográficos
Autores principales: Moratelli, Nicholas, Barraco, Manuele, Morelli, Davide, Cornia, Marcella, Baraldi, Lorenzo, Cucchiara, Rita
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9921965/
https://www.ncbi.nlm.nih.gov/pubmed/36772326
http://dx.doi.org/10.3390/s23031286
_version_ 1784887439144255488
author Moratelli, Nicholas
Barraco, Manuele
Morelli, Davide
Cornia, Marcella
Baraldi, Lorenzo
Cucchiara, Rita
author_facet Moratelli, Nicholas
Barraco, Manuele
Morelli, Davide
Cornia, Marcella
Baraldi, Lorenzo
Cucchiara, Rita
author_sort Moratelli, Nicholas
collection PubMed
description Research related to fashion and e-commerce domains is gaining attention in computer vision and multimedia communities. Following this trend, this article tackles the task of generating fine-grained and accurate natural language descriptions of fashion items, a recently-proposed and under-explored challenge that is still far from being solved. To overcome the limitations of previous approaches, a transformer-based captioning model was designed with the integration of external textual memory that could be accessed through k-nearest neighbor (kNN) searches. From an architectural point of view, the proposed transformer model can read and retrieve items from the external memory through cross-attention operations, and tune the flow of information coming from the external memory thanks to a novel fully attentive gate. Experimental analyses were carried out on the fashion captioning dataset (FACAD) for fashion image captioning, which contains more than 130k fine-grained descriptions, validating the effectiveness of the proposed approach and the proposed architectural strategies in comparison with carefully designed baselines and state-of-the-art approaches. The presented method constantly outperforms all compared approaches, demonstrating its effectiveness for fashion image captioning.
format Online
Article
Text
id pubmed-9921965
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-99219652023-02-12 Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates Moratelli, Nicholas Barraco, Manuele Morelli, Davide Cornia, Marcella Baraldi, Lorenzo Cucchiara, Rita Sensors (Basel) Article Research related to fashion and e-commerce domains is gaining attention in computer vision and multimedia communities. Following this trend, this article tackles the task of generating fine-grained and accurate natural language descriptions of fashion items, a recently-proposed and under-explored challenge that is still far from being solved. To overcome the limitations of previous approaches, a transformer-based captioning model was designed with the integration of external textual memory that could be accessed through k-nearest neighbor (kNN) searches. From an architectural point of view, the proposed transformer model can read and retrieve items from the external memory through cross-attention operations, and tune the flow of information coming from the external memory thanks to a novel fully attentive gate. Experimental analyses were carried out on the fashion captioning dataset (FACAD) for fashion image captioning, which contains more than 130k fine-grained descriptions, validating the effectiveness of the proposed approach and the proposed architectural strategies in comparison with carefully designed baselines and state-of-the-art approaches. The presented method constantly outperforms all compared approaches, demonstrating its effectiveness for fashion image captioning. MDPI 2023-01-23 /pmc/articles/PMC9921965/ /pubmed/36772326 http://dx.doi.org/10.3390/s23031286 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Moratelli, Nicholas
Barraco, Manuele
Morelli, Davide
Cornia, Marcella
Baraldi, Lorenzo
Cucchiara, Rita
Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates
title Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates
title_full Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates
title_fullStr Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates
title_full_unstemmed Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates
title_short Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates
title_sort fashion-oriented image captioning with external knowledge retrieval and fully attentive gates
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9921965/
https://www.ncbi.nlm.nih.gov/pubmed/36772326
http://dx.doi.org/10.3390/s23031286
work_keys_str_mv AT moratellinicholas fashionorientedimagecaptioningwithexternalknowledgeretrievalandfullyattentivegates
AT barracomanuele fashionorientedimagecaptioningwithexternalknowledgeretrievalandfullyattentivegates
AT morellidavide fashionorientedimagecaptioningwithexternalknowledgeretrievalandfullyattentivegates
AT corniamarcella fashionorientedimagecaptioningwithexternalknowledgeretrievalandfullyattentivegates
AT baraldilorenzo fashionorientedimagecaptioningwithexternalknowledgeretrievalandfullyattentivegates
AT cucchiararita fashionorientedimagecaptioningwithexternalknowledgeretrievalandfullyattentivegates