Cargando…

Lightweight dense video captioning with cross-modal attention and knowledge-enhanced unbiased scene graph

Dense video captioning (DVC) aims at generating description for each scene in a video. Despite attractive progress for this task, previous works usually only concentrate on exploiting visual features while neglecting audio information in the video, resulting in inaccurate scene event location. In th...

Descripción completa

Detalles Bibliográficos
Autores principales:	Han, Shixing, Liu, Jin, Zhang, Jinyingming, Gong, Peizhu, Zhang, Xiliang, He, Huihua
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2023
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9950023/ https://www.ncbi.nlm.nih.gov/pubmed/36855683 http://dx.doi.org/10.1007/s40747-023-00998-5

Ejemplares similares

Fusion of Multi-Modal Features to Enhance Dense Video Caption
por: Huang, Xuefei, et al.
Publicado: (2023)

Dense captioning and multidimensional evaluations for indoor robotic scenes
por: Wang, Hua, et al.
Publicado: (2023)

Unbiased pangenome graphs
por: Garrison, Erik, et al.
Publicado: (2022)

Lightweight Scene Text Recognition Based on Transformer
por: Luan, Xin, et al.
Publicado: (2023)

Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation
por: Jung, Gayoung, et al.
Publicado: (2021)

UAT: Universal Attention Transformer for Video Captioning
por: Im, Heeju, et al.
Publicado: (2022)

Movie Scene Event Extraction with Graph Attention Network Based on Argument Correlation Information
por: Yi, Qian, et al.
Publicado: (2023)

CMANet: Cross-Modality Attention Network for Indoor-Scene Semantic Segmentation
por: Zhu, Longze, et al.
Publicado: (2022)

Video captioning with stacked attention and semantic hard pull
por: Rahman, Md. Mushfiqur, et al.
Publicado: (2021)

A Lightweight Recurrent Grouping Attention Network for Video Super-Resolution
por: Zhu, Yonggui, et al.
Publicado: (2023)

Applications of Deep Learning for Dense Scenes Analysis in Agriculture: A Review
por: Zhang, Qian, et al.
Publicado: (2020)

Lightweight convolutional neural network for aircraft small target real-time detection in Airport videos in complex scenes
por: Li, Weidong, et al.
Publicado: (2022)

Spatio-Temporal Attention Model for Foreground Detection in Cross-Scene Surveillance Videos
por: Liang, Dong, et al.
Publicado: (2019)

OpenSceneGraph 3 Cookbook
por: Wang, Rui, et al.
Publicado: (2012)

Cross-Modality Person Re-Identification via Local Paired Graph Attention Network
por: Zhou, Jianglin, et al.
Publicado: (2023)

Film and Video Quality Optimization Using Attention Mechanism-Embedded Lightweight Neural Network Model
por: Ma, Youwen
Publicado: (2022)

Indoor Scene Change Captioning Based on Multimodality Data
por: Qiu, Yue, et al.
Publicado: (2020)

Modality attention fusion model with hybrid multi-head self-attention for video understanding
por: Zhuang, Xuqiang, et al.
Publicado: (2022)

Object Tracking in RGB-T Videos Using Modal-Aware Attention Network and Competitive Learning
por: Zhang, Hui, et al.
Publicado: (2020)

Class-dependent and cross-modal memory network considering sentimental features for video-based captioning
por: Xiong, Haitao, et al.
Publicado: (2023)

Pay attention to doctor–patient dialogues: Multi-modal knowledge graph attention image-text embedding for COVID-19 diagnosis
por: Zheng, Wenbo, et al.
Publicado: (2021)

An Efficient and Lightweight Convolutional Neural Network for Remote Sensing Image Scene Classification
por: Yu, Donghang, et al.
Publicado: (2020)

Lightweight Image Restoration Network for Strong Noise Removal in Nuclear Radiation Scenes
por: Sun, Xin, et al.
Publicado: (2021)

Retracted: Film and Video Quality Optimization Using Attention Mechanism-Embedded Lightweight Neural Network Model
por: Intelligence and Neuroscience, Computational
Publicado: (2023)

A Lightweight Visual Simultaneous Localization and Mapping Method with a High Precision in Dynamic Scenes
por: Zhang, Qi, et al.
Publicado: (2023)

Eye Movements during Dynamic Scene Viewing are Affected by Visual Attention Skills and Events of the Scene: Evidence from First-Person Shooter Gameplay Videos
por: Holm, Suvi K., et al.
Publicado: (2021)

Overt attentional correlates of memorability of scene images and their relationships to scene semantics
por: Lyu, Muxuan, et al.
Publicado: (2020)

MIRN: A multi-interest retrieval network with sequence-to-interest EM routing
por: Zhang, Xiliang, et al.
Publicado: (2023)

Irregular Scene Text Detection Based on a Graph Convolutional Network
por: Zhang, Shiyu, et al.
Publicado: (2023)

Lightweight and Efficient Image Dehazing Network Guided by Transmission Estimation from Real-World Hazy Scenes
por: Li, Zhan, et al.
Publicado: (2021)

A Lightweight Protocol for Secure Video Streaming
por: Venčkauskas, Algimantas, et al.
Publicado: (2018)

Graphle: Interactive exploration of large, dense graphs
por: Huttenhower, Curtis, et al.
Publicado: (2009)

eHUGS: Enhanced Hierarchical Unbiased Graph Shrinkage for Efficient Groupwise Registration
por: Wu, Guorong, et al.
Publicado: (2016)

Attention, Awareness, and the Perception of Auditory Scenes
por: Snyder, Joel S., et al.
Publicado: (2012)

The Influence of Action Video Games on Attentional Functions Across Visual and Auditory Modalities
por: Wu, Xia, et al.
Publicado: (2021)

Enhanced Adjacency Matrix-Based Lightweight Graph Convolution Network for Action Recognition
por: Zhang, Daqing, et al.
Publicado: (2023)

Focus prediction of medical microscopic images based on Lightweight Densely Connected with Squeeze-and-Excitation Network
por: Jiang, Hesong, et al.
Publicado: (2023)

CloudDenseNet: Lightweight Ground-Based Cloud Classification Method for Large-Scale Datasets Based on Reconstructed DenseNet
por: Li, Sheng, et al.
Publicado: (2023)

OpenSceneGraph 30 Beginner's Guide: Beginner's Guide
por: Wang, Rui, et al.
Publicado: (2010)

Social Image Captioning: Exploring Visual Attention and User Attention
por: Wang, Leiquan, et al.
Publicado: (2018)

Cannot write session to /tmp/vufind_sessions/sess_nrpjedhdhi15c7rprjpfkrchrl