Cargando…

Towards Generating and Evaluating Iconographic Image Captions of Artworks

To automatically generate accurate and meaningful textual descriptions of images is an ongoing research challenge. Recently, a lot of progress has been made by adopting multimodal deep learning approaches for integrating vision and language. However, the task of developing image captioning models is...

Descripción completa

Detalles Bibliográficos
Autor principal: Cetinic, Eva
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8404909/
https://www.ncbi.nlm.nih.gov/pubmed/34460759
http://dx.doi.org/10.3390/jimaging7080123
_version_ 1783746231255695360
author Cetinic, Eva
author_facet Cetinic, Eva
author_sort Cetinic, Eva
collection PubMed
description To automatically generate accurate and meaningful textual descriptions of images is an ongoing research challenge. Recently, a lot of progress has been made by adopting multimodal deep learning approaches for integrating vision and language. However, the task of developing image captioning models is most commonly addressed using datasets of natural images, while not many contributions have been made in the domain of artwork images. One of the main reasons for that is the lack of large-scale art datasets of adequate image-text pairs. Another reason is the fact that generating accurate descriptions of artwork images is particularly challenging because descriptions of artworks are more complex and can include multiple levels of interpretation. It is therefore also especially difficult to effectively evaluate generated captions of artwork images. The aim of this work is to address some of those challenges by utilizing a large-scale dataset of artwork images annotated with concepts from the Iconclass classification system. Using this dataset, a captioning model is developed by fine-tuning a transformer-based vision-language pretrained model. Due to the complex relations between image and text pairs in the domain of artwork images, the generated captions are evaluated using several quantitative and qualitative approaches. The performance is assessed using standard image captioning metrics and a recently introduced reference-free metric. The quality of the generated captions and the model’s capacity to generalize to new data is explored by employing the model to another art dataset to compare the relation between commonly generated captions and the genre of artworks. The overall results suggest that the model can generate meaningful captions that indicate a stronger relevance to the art historical context, particularly in comparison to captions obtained from models trained only on natural image datasets.
format Online
Article
Text
id pubmed-8404909
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-84049092021-10-28 Towards Generating and Evaluating Iconographic Image Captions of Artworks Cetinic, Eva J Imaging Article To automatically generate accurate and meaningful textual descriptions of images is an ongoing research challenge. Recently, a lot of progress has been made by adopting multimodal deep learning approaches for integrating vision and language. However, the task of developing image captioning models is most commonly addressed using datasets of natural images, while not many contributions have been made in the domain of artwork images. One of the main reasons for that is the lack of large-scale art datasets of adequate image-text pairs. Another reason is the fact that generating accurate descriptions of artwork images is particularly challenging because descriptions of artworks are more complex and can include multiple levels of interpretation. It is therefore also especially difficult to effectively evaluate generated captions of artwork images. The aim of this work is to address some of those challenges by utilizing a large-scale dataset of artwork images annotated with concepts from the Iconclass classification system. Using this dataset, a captioning model is developed by fine-tuning a transformer-based vision-language pretrained model. Due to the complex relations between image and text pairs in the domain of artwork images, the generated captions are evaluated using several quantitative and qualitative approaches. The performance is assessed using standard image captioning metrics and a recently introduced reference-free metric. The quality of the generated captions and the model’s capacity to generalize to new data is explored by employing the model to another art dataset to compare the relation between commonly generated captions and the genre of artworks. The overall results suggest that the model can generate meaningful captions that indicate a stronger relevance to the art historical context, particularly in comparison to captions obtained from models trained only on natural image datasets. MDPI 2021-07-23 /pmc/articles/PMC8404909/ /pubmed/34460759 http://dx.doi.org/10.3390/jimaging7080123 Text en © 2021 by the author. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Cetinic, Eva
Towards Generating and Evaluating Iconographic Image Captions of Artworks
title Towards Generating and Evaluating Iconographic Image Captions of Artworks
title_full Towards Generating and Evaluating Iconographic Image Captions of Artworks
title_fullStr Towards Generating and Evaluating Iconographic Image Captions of Artworks
title_full_unstemmed Towards Generating and Evaluating Iconographic Image Captions of Artworks
title_short Towards Generating and Evaluating Iconographic Image Captions of Artworks
title_sort towards generating and evaluating iconographic image captions of artworks
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8404909/
https://www.ncbi.nlm.nih.gov/pubmed/34460759
http://dx.doi.org/10.3390/jimaging7080123
work_keys_str_mv AT cetiniceva towardsgeneratingandevaluatingiconographicimagecaptionsofartworks