Cargando…
Fusion of Multi-Modal Features to Enhance Dense Video Caption
Dense video caption is a task that aims to help computers analyze the content of a video by generating abstract captions for a sequence of video frames. However, most of the existing methods only use visual features in the video and ignore the audio features that are also essential for understanding...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10304565/ https://www.ncbi.nlm.nih.gov/pubmed/37420732 http://dx.doi.org/10.3390/s23125565 |