Cargando…

Image Captioning Using Motion-CNN with Object Detection

Automatic image captioning has many important applications, such as the depiction of visual contents for visually impaired people or the indexing of images on the internet. Recently, deep learning-based image captioning models have been researched extensively. For caption generation, they learn the...

Descripción completa

Detalles Bibliográficos
Autores principales: Iwamura, Kiyohiko, Louhi Kasahara, Jun Younes, Moro, Alessandro, Yamashita, Atsushi, Asama, Hajime
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7916682/
https://www.ncbi.nlm.nih.gov/pubmed/33578956
http://dx.doi.org/10.3390/s21041270
_version_ 1783657533830856704
author Iwamura, Kiyohiko
Louhi Kasahara, Jun Younes
Moro, Alessandro
Yamashita, Atsushi
Asama, Hajime
author_facet Iwamura, Kiyohiko
Louhi Kasahara, Jun Younes
Moro, Alessandro
Yamashita, Atsushi
Asama, Hajime
author_sort Iwamura, Kiyohiko
collection PubMed
description Automatic image captioning has many important applications, such as the depiction of visual contents for visually impaired people or the indexing of images on the internet. Recently, deep learning-based image captioning models have been researched extensively. For caption generation, they learn the relation between image features and words included in the captions. However, image features might not be relevant for certain words such as verbs. Therefore, our earlier reported method included the use of motion features along with image features for generating captions including verbs. However, all the motion features were used. Since not all motion features contributed positively to the captioning process, unnecessary motion features decreased the captioning accuracy. As described herein, we use experiments with motion features for thorough analysis of the reasons for the decline in accuracy. We propose a novel, end-to-end trainable method for image caption generation that alleviates the decreased accuracy of caption generation. Our proposed model was evaluated using three datasets: MSR-VTT2016-Image, MSCOCO, and several copyright-free images. Results demonstrate that our proposed method improves caption generation performance.
format Online
Article
Text
id pubmed-7916682
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-79166822021-03-01 Image Captioning Using Motion-CNN with Object Detection Iwamura, Kiyohiko Louhi Kasahara, Jun Younes Moro, Alessandro Yamashita, Atsushi Asama, Hajime Sensors (Basel) Article Automatic image captioning has many important applications, such as the depiction of visual contents for visually impaired people or the indexing of images on the internet. Recently, deep learning-based image captioning models have been researched extensively. For caption generation, they learn the relation between image features and words included in the captions. However, image features might not be relevant for certain words such as verbs. Therefore, our earlier reported method included the use of motion features along with image features for generating captions including verbs. However, all the motion features were used. Since not all motion features contributed positively to the captioning process, unnecessary motion features decreased the captioning accuracy. As described herein, we use experiments with motion features for thorough analysis of the reasons for the decline in accuracy. We propose a novel, end-to-end trainable method for image caption generation that alleviates the decreased accuracy of caption generation. Our proposed model was evaluated using three datasets: MSR-VTT2016-Image, MSCOCO, and several copyright-free images. Results demonstrate that our proposed method improves caption generation performance. MDPI 2021-02-10 /pmc/articles/PMC7916682/ /pubmed/33578956 http://dx.doi.org/10.3390/s21041270 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Iwamura, Kiyohiko
Louhi Kasahara, Jun Younes
Moro, Alessandro
Yamashita, Atsushi
Asama, Hajime
Image Captioning Using Motion-CNN with Object Detection
title Image Captioning Using Motion-CNN with Object Detection
title_full Image Captioning Using Motion-CNN with Object Detection
title_fullStr Image Captioning Using Motion-CNN with Object Detection
title_full_unstemmed Image Captioning Using Motion-CNN with Object Detection
title_short Image Captioning Using Motion-CNN with Object Detection
title_sort image captioning using motion-cnn with object detection
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7916682/
https://www.ncbi.nlm.nih.gov/pubmed/33578956
http://dx.doi.org/10.3390/s21041270
work_keys_str_mv AT iwamurakiyohiko imagecaptioningusingmotioncnnwithobjectdetection
AT louhikasaharajunyounes imagecaptioningusingmotioncnnwithobjectdetection
AT moroalessandro imagecaptioningusingmotioncnnwithobjectdetection
AT yamashitaatsushi imagecaptioningusingmotioncnnwithobjectdetection
AT asamahajime imagecaptioningusingmotioncnnwithobjectdetection