Cargando…

Research on image content description in Chinese based on fusion of image global and local features

Most image content modelling methods are designed for English description which is different form Chinese in syntax structure. The few existing Chinese image description models do not fully integrate the global features and the local features of an image, limiting the capability of the models to rep...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kong, Dongyi, Zhao, Hong, Zeng, Xiangyan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9423645/ https://www.ncbi.nlm.nih.gov/pubmed/36037226 http://dx.doi.org/10.1371/journal.pone.0271322

_version_	1784778065273946112
author	Kong, Dongyi Zhao, Hong Zeng, Xiangyan
author_facet	Kong, Dongyi Zhao, Hong Zeng, Xiangyan
author_sort	Kong, Dongyi
collection	PubMed
description	Most image content modelling methods are designed for English description which is different form Chinese in syntax structure. The few existing Chinese image description models do not fully integrate the global features and the local features of an image, limiting the capability of the models to represent the details of the image. In this paper, an encoder-decoder architecture based on the fusion of global and local features is used to describe the Chinese image content. In the encoding stage, the global and local features of the image are extracted by the Convolutional Neural Network (CNN) and the target detection network, and fed to the feature fusion module. In the decoding stage, an image feature attention mechanism is used to calculate the weights of word vectors, and a new gating mechanism is added to the traditional Long Short-Term Memory (LSTM) network to emphasize the fused image features, and the corresponding word vectors. In the description generation stage, the beam search algorithm is used to optimize the word vector generation process. The integration of global and local features of the image is strengthened to allow the model to fully understand the details of the image through the above three stages. The experimental results show that the model improves the quality of Chinese description of image content. Compared with the baseline model, the score of CIDEr evaluation index improves by 20.07%, and other evaluation indices also improve significantly.
format	Online Article Text
id	pubmed-9423645
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-94236452022-08-30 Research on image content description in Chinese based on fusion of image global and local features Kong, Dongyi Zhao, Hong Zeng, Xiangyan PLoS One Research Article Most image content modelling methods are designed for English description which is different form Chinese in syntax structure. The few existing Chinese image description models do not fully integrate the global features and the local features of an image, limiting the capability of the models to represent the details of the image. In this paper, an encoder-decoder architecture based on the fusion of global and local features is used to describe the Chinese image content. In the encoding stage, the global and local features of the image are extracted by the Convolutional Neural Network (CNN) and the target detection network, and fed to the feature fusion module. In the decoding stage, an image feature attention mechanism is used to calculate the weights of word vectors, and a new gating mechanism is added to the traditional Long Short-Term Memory (LSTM) network to emphasize the fused image features, and the corresponding word vectors. In the description generation stage, the beam search algorithm is used to optimize the word vector generation process. The integration of global and local features of the image is strengthened to allow the model to fully understand the details of the image through the above three stages. The experimental results show that the model improves the quality of Chinese description of image content. Compared with the baseline model, the score of CIDEr evaluation index improves by 20.07%, and other evaluation indices also improve significantly. Public Library of Science 2022-08-29 /pmc/articles/PMC9423645/ /pubmed/36037226 http://dx.doi.org/10.1371/journal.pone.0271322 Text en © 2022 Kong et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Kong, Dongyi Zhao, Hong Zeng, Xiangyan Research on image content description in Chinese based on fusion of image global and local features
title	Research on image content description in Chinese based on fusion of image global and local features
title_full	Research on image content description in Chinese based on fusion of image global and local features
title_fullStr	Research on image content description in Chinese based on fusion of image global and local features
title_full_unstemmed	Research on image content description in Chinese based on fusion of image global and local features
title_short	Research on image content description in Chinese based on fusion of image global and local features
title_sort	research on image content description in chinese based on fusion of image global and local features
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9423645/ https://www.ncbi.nlm.nih.gov/pubmed/36037226 http://dx.doi.org/10.1371/journal.pone.0271322
work_keys_str_mv	AT kongdongyi researchonimagecontentdescriptioninchinesebasedonfusionofimageglobalandlocalfeatures AT zhaohong researchonimagecontentdescriptioninchinesebasedonfusionofimageglobalandlocalfeatures AT zengxiangyan researchonimagecontentdescriptioninchinesebasedonfusionofimageglobalandlocalfeatures

Research on image content description in Chinese based on fusion of image global and local features

Ejemplares similares