Cargando…
Lightweight dense video captioning with cross-modal attention and knowledge-enhanced unbiased scene graph
Dense video captioning (DVC) aims at generating description for each scene in a video. Despite attractive progress for this task, previous works usually only concentrate on exploiting visual features while neglecting audio information in the video, resulting in inaccurate scene event location. In th...
Autores principales: | Han, Shixing, Liu, Jin, Zhang, Jinyingming, Gong, Peizhu, Zhang, Xiliang, He, Huihua |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9950023/ https://www.ncbi.nlm.nih.gov/pubmed/36855683 http://dx.doi.org/10.1007/s40747-023-00998-5 |
Ejemplares similares
-
Fusion of Multi-Modal Features to Enhance Dense Video Caption
por: Huang, Xuefei, et al.
Publicado: (2023) -
Dense captioning and multidimensional evaluations for indoor robotic scenes
por: Wang, Hua, et al.
Publicado: (2023) -
Unbiased pangenome graphs
por: Garrison, Erik, et al.
Publicado: (2022) -
Lightweight Scene Text Recognition Based on Transformer
por: Luan, Xin, et al.
Publicado: (2023) -
Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation
por: Jung, Gayoung, et al.
Publicado: (2021)