Cargando…
Image Captioning Using Motion-CNN with Object Detection
Automatic image captioning has many important applications, such as the depiction of visual contents for visually impaired people or the indexing of images on the internet. Recently, deep learning-based image captioning models have been researched extensively. For caption generation, they learn the...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7916682/ https://www.ncbi.nlm.nih.gov/pubmed/33578956 http://dx.doi.org/10.3390/s21041270 |
_version_ | 1783657533830856704 |
---|---|
author | Iwamura, Kiyohiko Louhi Kasahara, Jun Younes Moro, Alessandro Yamashita, Atsushi Asama, Hajime |
author_facet | Iwamura, Kiyohiko Louhi Kasahara, Jun Younes Moro, Alessandro Yamashita, Atsushi Asama, Hajime |
author_sort | Iwamura, Kiyohiko |
collection | PubMed |
description | Automatic image captioning has many important applications, such as the depiction of visual contents for visually impaired people or the indexing of images on the internet. Recently, deep learning-based image captioning models have been researched extensively. For caption generation, they learn the relation between image features and words included in the captions. However, image features might not be relevant for certain words such as verbs. Therefore, our earlier reported method included the use of motion features along with image features for generating captions including verbs. However, all the motion features were used. Since not all motion features contributed positively to the captioning process, unnecessary motion features decreased the captioning accuracy. As described herein, we use experiments with motion features for thorough analysis of the reasons for the decline in accuracy. We propose a novel, end-to-end trainable method for image caption generation that alleviates the decreased accuracy of caption generation. Our proposed model was evaluated using three datasets: MSR-VTT2016-Image, MSCOCO, and several copyright-free images. Results demonstrate that our proposed method improves caption generation performance. |
format | Online Article Text |
id | pubmed-7916682 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-79166822021-03-01 Image Captioning Using Motion-CNN with Object Detection Iwamura, Kiyohiko Louhi Kasahara, Jun Younes Moro, Alessandro Yamashita, Atsushi Asama, Hajime Sensors (Basel) Article Automatic image captioning has many important applications, such as the depiction of visual contents for visually impaired people or the indexing of images on the internet. Recently, deep learning-based image captioning models have been researched extensively. For caption generation, they learn the relation between image features and words included in the captions. However, image features might not be relevant for certain words such as verbs. Therefore, our earlier reported method included the use of motion features along with image features for generating captions including verbs. However, all the motion features were used. Since not all motion features contributed positively to the captioning process, unnecessary motion features decreased the captioning accuracy. As described herein, we use experiments with motion features for thorough analysis of the reasons for the decline in accuracy. We propose a novel, end-to-end trainable method for image caption generation that alleviates the decreased accuracy of caption generation. Our proposed model was evaluated using three datasets: MSR-VTT2016-Image, MSCOCO, and several copyright-free images. Results demonstrate that our proposed method improves caption generation performance. MDPI 2021-02-10 /pmc/articles/PMC7916682/ /pubmed/33578956 http://dx.doi.org/10.3390/s21041270 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Iwamura, Kiyohiko Louhi Kasahara, Jun Younes Moro, Alessandro Yamashita, Atsushi Asama, Hajime Image Captioning Using Motion-CNN with Object Detection |
title | Image Captioning Using Motion-CNN with Object Detection |
title_full | Image Captioning Using Motion-CNN with Object Detection |
title_fullStr | Image Captioning Using Motion-CNN with Object Detection |
title_full_unstemmed | Image Captioning Using Motion-CNN with Object Detection |
title_short | Image Captioning Using Motion-CNN with Object Detection |
title_sort | image captioning using motion-cnn with object detection |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7916682/ https://www.ncbi.nlm.nih.gov/pubmed/33578956 http://dx.doi.org/10.3390/s21041270 |
work_keys_str_mv | AT iwamurakiyohiko imagecaptioningusingmotioncnnwithobjectdetection AT louhikasaharajunyounes imagecaptioningusingmotioncnnwithobjectdetection AT moroalessandro imagecaptioningusingmotioncnnwithobjectdetection AT yamashitaatsushi imagecaptioningusingmotioncnnwithobjectdetection AT asamahajime imagecaptioningusingmotioncnnwithobjectdetection |