Cargando…

Caps Captioning: A Modern Image Captioning Approach Based on Improved Capsule Network

In image captioning models, the main challenge in describing an image is identifying all the objects by precisely considering the relationships between the objects and producing various captions. Over the past few years, many methods have been proposed, from an attribute-to-attribute comparison appr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Javanmardi, Shima, Latif, Ali Mohammad, Sadeghi, Mohammad Taghi, Jahanbanifard, Mehrdad, Bonsangue, Marcello, Verbeek, Fons J.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9654532/ https://www.ncbi.nlm.nih.gov/pubmed/36366079 http://dx.doi.org/10.3390/s22218376

_version_	1784828955810856960
author	Javanmardi, Shima Latif, Ali Mohammad Sadeghi, Mohammad Taghi Jahanbanifard, Mehrdad Bonsangue, Marcello Verbeek, Fons J.
author_facet	Javanmardi, Shima Latif, Ali Mohammad Sadeghi, Mohammad Taghi Jahanbanifard, Mehrdad Bonsangue, Marcello Verbeek, Fons J.
author_sort	Javanmardi, Shima
collection	PubMed
description	In image captioning models, the main challenge in describing an image is identifying all the objects by precisely considering the relationships between the objects and producing various captions. Over the past few years, many methods have been proposed, from an attribute-to-attribute comparison approach to handling issues related to semantics and their relationships. Despite the improvements, the existing techniques suffer from inadequate positional and geometrical attributes concepts. The reason is that most of the abovementioned approaches depend on Convolutional Neural Networks (CNNs) for object detection. CNN is notorious for failing to detect equivariance and rotational invariance in objects. Moreover, the pooling layers in CNNs cause valuable information to be lost. Inspired by the recent successful approaches, this paper introduces a novel framework for extracting meaningful descriptions based on a parallelized capsule network that describes the content of images through a high level of understanding of the semantic contents of an image. The main contribution of this paper is proposing a new method that not only overrides the limitations of CNNs but also generates descriptions with a wide variety of words by using Wikipedia. In our framework, capsules focus on the generation of meaningful descriptions with more detailed spatial and geometrical attributes for a given set of images by considering the position of the entities as well as their relationships. Qualitative experiments on the benchmark dataset MS-COCO show that our framework outperforms state-of-the-art image captioning models when describing the semantic content of the images.
format	Online Article Text
id	pubmed-9654532
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-96545322022-11-15 Caps Captioning: A Modern Image Captioning Approach Based on Improved Capsule Network Javanmardi, Shima Latif, Ali Mohammad Sadeghi, Mohammad Taghi Jahanbanifard, Mehrdad Bonsangue, Marcello Verbeek, Fons J. Sensors (Basel) Article In image captioning models, the main challenge in describing an image is identifying all the objects by precisely considering the relationships between the objects and producing various captions. Over the past few years, many methods have been proposed, from an attribute-to-attribute comparison approach to handling issues related to semantics and their relationships. Despite the improvements, the existing techniques suffer from inadequate positional and geometrical attributes concepts. The reason is that most of the abovementioned approaches depend on Convolutional Neural Networks (CNNs) for object detection. CNN is notorious for failing to detect equivariance and rotational invariance in objects. Moreover, the pooling layers in CNNs cause valuable information to be lost. Inspired by the recent successful approaches, this paper introduces a novel framework for extracting meaningful descriptions based on a parallelized capsule network that describes the content of images through a high level of understanding of the semantic contents of an image. The main contribution of this paper is proposing a new method that not only overrides the limitations of CNNs but also generates descriptions with a wide variety of words by using Wikipedia. In our framework, capsules focus on the generation of meaningful descriptions with more detailed spatial and geometrical attributes for a given set of images by considering the position of the entities as well as their relationships. Qualitative experiments on the benchmark dataset MS-COCO show that our framework outperforms state-of-the-art image captioning models when describing the semantic content of the images. MDPI 2022-11-01 /pmc/articles/PMC9654532/ /pubmed/36366079 http://dx.doi.org/10.3390/s22218376 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Javanmardi, Shima Latif, Ali Mohammad Sadeghi, Mohammad Taghi Jahanbanifard, Mehrdad Bonsangue, Marcello Verbeek, Fons J. Caps Captioning: A Modern Image Captioning Approach Based on Improved Capsule Network
title	Caps Captioning: A Modern Image Captioning Approach Based on Improved Capsule Network
title_full	Caps Captioning: A Modern Image Captioning Approach Based on Improved Capsule Network
title_fullStr	Caps Captioning: A Modern Image Captioning Approach Based on Improved Capsule Network
title_full_unstemmed	Caps Captioning: A Modern Image Captioning Approach Based on Improved Capsule Network
title_short	Caps Captioning: A Modern Image Captioning Approach Based on Improved Capsule Network
title_sort	caps captioning: a modern image captioning approach based on improved capsule network
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9654532/ https://www.ncbi.nlm.nih.gov/pubmed/36366079 http://dx.doi.org/10.3390/s22218376
work_keys_str_mv	AT javanmardishima capscaptioningamodernimagecaptioningapproachbasedonimprovedcapsulenetwork AT latifalimohammad capscaptioningamodernimagecaptioningapproachbasedonimprovedcapsulenetwork AT sadeghimohammadtaghi capscaptioningamodernimagecaptioningapproachbasedonimprovedcapsulenetwork AT jahanbanifardmehrdad capscaptioningamodernimagecaptioningapproachbasedonimprovedcapsulenetwork AT bonsanguemarcello capscaptioningamodernimagecaptioningapproachbasedonimprovedcapsulenetwork AT verbeekfonsj capscaptioningamodernimagecaptioningapproachbasedonimprovedcapsulenetwork

Caps Captioning: A Modern Image Captioning Approach Based on Improved Capsule Network

Ejemplares similares