Cargando…
Fusion of Multi-Modal Features to Enhance Dense Video Caption
Dense video caption is a task that aims to help computers analyze the content of a video by generating abstract captions for a sequence of video frames. However, most of the existing methods only use visual features in the video and ignore the audio features that are also essential for understanding...
Autores principales: | Huang, Xuefei, Chan, Ka-Hou, Wu, Weifan, Sheng, Hao, Ke, Wei |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10304565/ https://www.ncbi.nlm.nih.gov/pubmed/37420732 http://dx.doi.org/10.3390/s23125565 |
Ejemplares similares
-
Lightweight dense video captioning with cross-modal attention and knowledge-enhanced unbiased scene graph
por: Han, Shixing, et al.
Publicado: (2023) -
Research on Video Captioning Based on Multifeature Fusion
por: Zhao, Hong, et al.
Publicado: (2022) -
Modality attention fusion model with hybrid multi-head self-attention for video understanding
por: Zhuang, Xuqiang, et al.
Publicado: (2022) -
Class-dependent and cross-modal memory network considering sentimental features for video-based captioning
por: Xiong, Haitao, et al.
Publicado: (2023) -
Combining Sparse and Dense Features to Improve Multi-Modal Registration for Brain DTI Images
por: Moldovanu, Simona, et al.
Publicado: (2020)