Cargando…
Lightweight dense video captioning with cross-modal attention and knowledge-enhanced unbiased scene graph
Dense video captioning (DVC) aims at generating description for each scene in a video. Despite attractive progress for this task, previous works usually only concentrate on exploiting visual features while neglecting audio information in the video, resulting in inaccurate scene event location. In th...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9950023/ https://www.ncbi.nlm.nih.gov/pubmed/36855683 http://dx.doi.org/10.1007/s40747-023-00998-5 |