Cargando…
Class-dependent and cross-modal memory network considering sentimental features for video-based captioning
The video-based commonsense captioning task aims to add multiple commonsense descriptions to video captions to understand video content better. This paper aims to consider the importance of cross-modal mapping. We propose a combined framework called Class-dependent and Cross-modal Memory Network con...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9975600/ https://www.ncbi.nlm.nih.gov/pubmed/36874867 http://dx.doi.org/10.3389/fpsyg.2023.1124369 |
_version_ | 1784898907659042816 |
---|---|
author | Xiong, Haitao Zhou, Yuchen Liu, Jiaming Cai, Yuanyuan |
author_facet | Xiong, Haitao Zhou, Yuchen Liu, Jiaming Cai, Yuanyuan |
author_sort | Xiong, Haitao |
collection | PubMed |
description | The video-based commonsense captioning task aims to add multiple commonsense descriptions to video captions to understand video content better. This paper aims to consider the importance of cross-modal mapping. We propose a combined framework called Class-dependent and Cross-modal Memory Network considering SENtimental features (CCMN-SEN) for Video-based Captioning to enhance commonsense caption generation. Firstly, we develop class-dependent memory for recording the alignment between video features and text. It only allows cross-modal interactions and generation on cross-modal matrices that share the same labels. Then, to understand the sentiments conveyed in the videos and generate accurate captions, we add sentiment features to facilitate commonsense caption generation. Experiment results demonstrate that our proposed CCMN-SEN significantly outperforms the state-of-the-art methods. These results have practical significance for understanding video content better. |
format | Online Article Text |
id | pubmed-9975600 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-99756002023-03-02 Class-dependent and cross-modal memory network considering sentimental features for video-based captioning Xiong, Haitao Zhou, Yuchen Liu, Jiaming Cai, Yuanyuan Front Psychol Psychology The video-based commonsense captioning task aims to add multiple commonsense descriptions to video captions to understand video content better. This paper aims to consider the importance of cross-modal mapping. We propose a combined framework called Class-dependent and Cross-modal Memory Network considering SENtimental features (CCMN-SEN) for Video-based Captioning to enhance commonsense caption generation. Firstly, we develop class-dependent memory for recording the alignment between video features and text. It only allows cross-modal interactions and generation on cross-modal matrices that share the same labels. Then, to understand the sentiments conveyed in the videos and generate accurate captions, we add sentiment features to facilitate commonsense caption generation. Experiment results demonstrate that our proposed CCMN-SEN significantly outperforms the state-of-the-art methods. These results have practical significance for understanding video content better. Frontiers Media S.A. 2023-02-15 /pmc/articles/PMC9975600/ /pubmed/36874867 http://dx.doi.org/10.3389/fpsyg.2023.1124369 Text en Copyright © 2023 Xiong, Zhou, Liu and Cai. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Psychology Xiong, Haitao Zhou, Yuchen Liu, Jiaming Cai, Yuanyuan Class-dependent and cross-modal memory network considering sentimental features for video-based captioning |
title | Class-dependent and cross-modal memory network considering sentimental features for video-based captioning |
title_full | Class-dependent and cross-modal memory network considering sentimental features for video-based captioning |
title_fullStr | Class-dependent and cross-modal memory network considering sentimental features for video-based captioning |
title_full_unstemmed | Class-dependent and cross-modal memory network considering sentimental features for video-based captioning |
title_short | Class-dependent and cross-modal memory network considering sentimental features for video-based captioning |
title_sort | class-dependent and cross-modal memory network considering sentimental features for video-based captioning |
topic | Psychology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9975600/ https://www.ncbi.nlm.nih.gov/pubmed/36874867 http://dx.doi.org/10.3389/fpsyg.2023.1124369 |
work_keys_str_mv | AT xionghaitao classdependentandcrossmodalmemorynetworkconsideringsentimentalfeaturesforvideobasedcaptioning AT zhouyuchen classdependentandcrossmodalmemorynetworkconsideringsentimentalfeaturesforvideobasedcaptioning AT liujiaming classdependentandcrossmodalmemorynetworkconsideringsentimentalfeaturesforvideobasedcaptioning AT caiyuanyuan classdependentandcrossmodalmemorynetworkconsideringsentimentalfeaturesforvideobasedcaptioning |