Cargando…

Class-dependent and cross-modal memory network considering sentimental features for video-based captioning

The video-based commonsense captioning task aims to add multiple commonsense descriptions to video captions to understand video content better. This paper aims to consider the importance of cross-modal mapping. We propose a combined framework called Class-dependent and Cross-modal Memory Network con...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xiong, Haitao, Zhou, Yuchen, Liu, Jiaming, Cai, Yuanyuan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Psychology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9975600/ https://www.ncbi.nlm.nih.gov/pubmed/36874867 http://dx.doi.org/10.3389/fpsyg.2023.1124369

_version_	1784898907659042816
author	Xiong, Haitao Zhou, Yuchen Liu, Jiaming Cai, Yuanyuan
author_facet	Xiong, Haitao Zhou, Yuchen Liu, Jiaming Cai, Yuanyuan
author_sort	Xiong, Haitao
collection	PubMed
description	The video-based commonsense captioning task aims to add multiple commonsense descriptions to video captions to understand video content better. This paper aims to consider the importance of cross-modal mapping. We propose a combined framework called Class-dependent and Cross-modal Memory Network considering SENtimental features (CCMN-SEN) for Video-based Captioning to enhance commonsense caption generation. Firstly, we develop class-dependent memory for recording the alignment between video features and text. It only allows cross-modal interactions and generation on cross-modal matrices that share the same labels. Then, to understand the sentiments conveyed in the videos and generate accurate captions, we add sentiment features to facilitate commonsense caption generation. Experiment results demonstrate that our proposed CCMN-SEN significantly outperforms the state-of-the-art methods. These results have practical significance for understanding video content better.
format	Online Article Text
id	pubmed-9975600
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-99756002023-03-02 Class-dependent and cross-modal memory network considering sentimental features for video-based captioning Xiong, Haitao Zhou, Yuchen Liu, Jiaming Cai, Yuanyuan Front Psychol Psychology The video-based commonsense captioning task aims to add multiple commonsense descriptions to video captions to understand video content better. This paper aims to consider the importance of cross-modal mapping. We propose a combined framework called Class-dependent and Cross-modal Memory Network considering SENtimental features (CCMN-SEN) for Video-based Captioning to enhance commonsense caption generation. Firstly, we develop class-dependent memory for recording the alignment between video features and text. It only allows cross-modal interactions and generation on cross-modal matrices that share the same labels. Then, to understand the sentiments conveyed in the videos and generate accurate captions, we add sentiment features to facilitate commonsense caption generation. Experiment results demonstrate that our proposed CCMN-SEN significantly outperforms the state-of-the-art methods. These results have practical significance for understanding video content better. Frontiers Media S.A. 2023-02-15 /pmc/articles/PMC9975600/ /pubmed/36874867 http://dx.doi.org/10.3389/fpsyg.2023.1124369 Text en Copyright © 2023 Xiong, Zhou, Liu and Cai. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Psychology Xiong, Haitao Zhou, Yuchen Liu, Jiaming Cai, Yuanyuan Class-dependent and cross-modal memory network considering sentimental features for video-based captioning
title	Class-dependent and cross-modal memory network considering sentimental features for video-based captioning
title_full	Class-dependent and cross-modal memory network considering sentimental features for video-based captioning
title_fullStr	Class-dependent and cross-modal memory network considering sentimental features for video-based captioning
title_full_unstemmed	Class-dependent and cross-modal memory network considering sentimental features for video-based captioning
title_short	Class-dependent and cross-modal memory network considering sentimental features for video-based captioning
title_sort	class-dependent and cross-modal memory network considering sentimental features for video-based captioning
topic	Psychology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9975600/ https://www.ncbi.nlm.nih.gov/pubmed/36874867 http://dx.doi.org/10.3389/fpsyg.2023.1124369
work_keys_str_mv	AT xionghaitao classdependentandcrossmodalmemorynetworkconsideringsentimentalfeaturesforvideobasedcaptioning AT zhouyuchen classdependentandcrossmodalmemorynetworkconsideringsentimentalfeaturesforvideobasedcaptioning AT liujiaming classdependentandcrossmodalmemorynetworkconsideringsentimentalfeaturesforvideobasedcaptioning AT caiyuanyuan classdependentandcrossmodalmemorynetworkconsideringsentimentalfeaturesforvideobasedcaptioning

Class-dependent and cross-modal memory network considering sentimental features for video-based captioning

Ejemplares similares