Cargando…

Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory

Automatically describing contents of an image using natural language has drawn much attention because it not only integrates computer vision and natural language processing but also has practical applications. Using an end-to-end approach, we propose a bidirectional semantic attention-based guiding...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cao, Pengfei, Yang, Zhongyi, Sun, Liang, Liang, Yanchun, Yang, Mary Qu, Guan, Renchu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2019
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8758065/ https://www.ncbi.nlm.nih.gov/pubmed/35035261 http://dx.doi.org/10.1007/s11063-018-09973-5

_version_	1784632821501919232
author	Cao, Pengfei Yang, Zhongyi Sun, Liang Liang, Yanchun Yang, Mary Qu Guan, Renchu
author_facet	Cao, Pengfei Yang, Zhongyi Sun, Liang Liang, Yanchun Yang, Mary Qu Guan, Renchu
author_sort	Cao, Pengfei
collection	PubMed
description	Automatically describing contents of an image using natural language has drawn much attention because it not only integrates computer vision and natural language processing but also has practical applications. Using an end-to-end approach, we propose a bidirectional semantic attention-based guiding of long short-term memory (Bag-LSTM) model for image captioning. The proposed model consciously refines image features from previously generated text. By fine-tuning the parameters of convolution neural networks, Bag-LSTM obtains more text-related image features via feedback propagation than other models. As opposed to existing guidance-LSTM methods which directly add image features into each unit of an LSTM block, our fine-tuned model dynamically leverages more text-conditional image features, acquired by the semantic attention mechanism, as guidance information. Moreover, we exploit bidirectional gLSTM as the caption generator, which is capable of learning long term relations between visual features and semantic information by making use of both historical and future contextual information. In addition, variations of the Bag-LSTM model are proposed in an effort to sufficiently describe high-level visual-language interactions. Experiments on the Flickr8k and MSCOCO benchmark datasets demonstrate the effectiveness of the model, as compared with the baseline algorithms, such as it is 51.2% higher than BRNN on CIDEr metric.
format	Online Article Text
id	pubmed-8758065
institution	National Center for Biotechnology Information
language	English
publishDate	2019
record_format	MEDLINE/PubMed
spelling	pubmed-87580652022-01-13 Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory Cao, Pengfei Yang, Zhongyi Sun, Liang Liang, Yanchun Yang, Mary Qu Guan, Renchu Neural Process Lett Article Automatically describing contents of an image using natural language has drawn much attention because it not only integrates computer vision and natural language processing but also has practical applications. Using an end-to-end approach, we propose a bidirectional semantic attention-based guiding of long short-term memory (Bag-LSTM) model for image captioning. The proposed model consciously refines image features from previously generated text. By fine-tuning the parameters of convolution neural networks, Bag-LSTM obtains more text-related image features via feedback propagation than other models. As opposed to existing guidance-LSTM methods which directly add image features into each unit of an LSTM block, our fine-tuned model dynamically leverages more text-conditional image features, acquired by the semantic attention mechanism, as guidance information. Moreover, we exploit bidirectional gLSTM as the caption generator, which is capable of learning long term relations between visual features and semantic information by making use of both historical and future contextual information. In addition, variations of the Bag-LSTM model are proposed in an effort to sufficiently describe high-level visual-language interactions. Experiments on the Flickr8k and MSCOCO benchmark datasets demonstrate the effectiveness of the model, as compared with the baseline algorithms, such as it is 51.2% higher than BRNN on CIDEr metric. 2019-08 2019-01-11 /pmc/articles/PMC8758065/ /pubmed/35035261 http://dx.doi.org/10.1007/s11063-018-09973-5 Text en https://creativecommons.org/licenses/by/4.0/Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle	Article Cao, Pengfei Yang, Zhongyi Sun, Liang Liang, Yanchun Yang, Mary Qu Guan, Renchu Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory
title	Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory
title_full	Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory
title_fullStr	Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory
title_full_unstemmed	Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory
title_short	Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory
title_sort	image captioning with bidirectional semantic attention-based guiding of long short-term memory
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8758065/ https://www.ncbi.nlm.nih.gov/pubmed/35035261 http://dx.doi.org/10.1007/s11063-018-09973-5
work_keys_str_mv	AT caopengfei imagecaptioningwithbidirectionalsemanticattentionbasedguidingoflongshorttermmemory AT yangzhongyi imagecaptioningwithbidirectionalsemanticattentionbasedguidingoflongshorttermmemory AT sunliang imagecaptioningwithbidirectionalsemanticattentionbasedguidingoflongshorttermmemory AT liangyanchun imagecaptioningwithbidirectionalsemanticattentionbasedguidingoflongshorttermmemory AT yangmaryqu imagecaptioningwithbidirectionalsemanticattentionbasedguidingoflongshorttermmemory AT guanrenchu imagecaptioningwithbidirectionalsemanticattentionbasedguidingoflongshorttermmemory

Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory

Ejemplares similares