Cargando…

Class-dependent and cross-modal memory network considering sentimental features for video-based captioning

The video-based commonsense captioning task aims to add multiple commonsense descriptions to video captions to understand video content better. This paper aims to consider the importance of cross-modal mapping. We propose a combined framework called Class-dependent and Cross-modal Memory Network con...

Descripción completa

Detalles Bibliográficos
Autores principales: Xiong, Haitao, Zhou, Yuchen, Liu, Jiaming, Cai, Yuanyuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9975600/
https://www.ncbi.nlm.nih.gov/pubmed/36874867
http://dx.doi.org/10.3389/fpsyg.2023.1124369
_version_ 1784898907659042816
author Xiong, Haitao
Zhou, Yuchen
Liu, Jiaming
Cai, Yuanyuan
author_facet Xiong, Haitao
Zhou, Yuchen
Liu, Jiaming
Cai, Yuanyuan
author_sort Xiong, Haitao
collection PubMed
description The video-based commonsense captioning task aims to add multiple commonsense descriptions to video captions to understand video content better. This paper aims to consider the importance of cross-modal mapping. We propose a combined framework called Class-dependent and Cross-modal Memory Network considering SENtimental features (CCMN-SEN) for Video-based Captioning to enhance commonsense caption generation. Firstly, we develop class-dependent memory for recording the alignment between video features and text. It only allows cross-modal interactions and generation on cross-modal matrices that share the same labels. Then, to understand the sentiments conveyed in the videos and generate accurate captions, we add sentiment features to facilitate commonsense caption generation. Experiment results demonstrate that our proposed CCMN-SEN significantly outperforms the state-of-the-art methods. These results have practical significance for understanding video content better.
format Online
Article
Text
id pubmed-9975600
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-99756002023-03-02 Class-dependent and cross-modal memory network considering sentimental features for video-based captioning Xiong, Haitao Zhou, Yuchen Liu, Jiaming Cai, Yuanyuan Front Psychol Psychology The video-based commonsense captioning task aims to add multiple commonsense descriptions to video captions to understand video content better. This paper aims to consider the importance of cross-modal mapping. We propose a combined framework called Class-dependent and Cross-modal Memory Network considering SENtimental features (CCMN-SEN) for Video-based Captioning to enhance commonsense caption generation. Firstly, we develop class-dependent memory for recording the alignment between video features and text. It only allows cross-modal interactions and generation on cross-modal matrices that share the same labels. Then, to understand the sentiments conveyed in the videos and generate accurate captions, we add sentiment features to facilitate commonsense caption generation. Experiment results demonstrate that our proposed CCMN-SEN significantly outperforms the state-of-the-art methods. These results have practical significance for understanding video content better. Frontiers Media S.A. 2023-02-15 /pmc/articles/PMC9975600/ /pubmed/36874867 http://dx.doi.org/10.3389/fpsyg.2023.1124369 Text en Copyright © 2023 Xiong, Zhou, Liu and Cai. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Psychology
Xiong, Haitao
Zhou, Yuchen
Liu, Jiaming
Cai, Yuanyuan
Class-dependent and cross-modal memory network considering sentimental features for video-based captioning
title Class-dependent and cross-modal memory network considering sentimental features for video-based captioning
title_full Class-dependent and cross-modal memory network considering sentimental features for video-based captioning
title_fullStr Class-dependent and cross-modal memory network considering sentimental features for video-based captioning
title_full_unstemmed Class-dependent and cross-modal memory network considering sentimental features for video-based captioning
title_short Class-dependent and cross-modal memory network considering sentimental features for video-based captioning
title_sort class-dependent and cross-modal memory network considering sentimental features for video-based captioning
topic Psychology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9975600/
https://www.ncbi.nlm.nih.gov/pubmed/36874867
http://dx.doi.org/10.3389/fpsyg.2023.1124369
work_keys_str_mv AT xionghaitao classdependentandcrossmodalmemorynetworkconsideringsentimentalfeaturesforvideobasedcaptioning
AT zhouyuchen classdependentandcrossmodalmemorynetworkconsideringsentimentalfeaturesforvideobasedcaptioning
AT liujiaming classdependentandcrossmodalmemorynetworkconsideringsentimentalfeaturesforvideobasedcaptioning
AT caiyuanyuan classdependentandcrossmodalmemorynetworkconsideringsentimentalfeaturesforvideobasedcaptioning