Cargando…

Video captioning based on vision transformer and reinforcement learning

Global encoding of visual features in video captioning is important for improving the description accuracy. In this paper, we propose a video captioning method that combines Vision Transformer (ViT) and reinforcement learning. Firstly, Resnet-152 and ResNeXt-101 are used to extract features from vid...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhao, Hong, Chen, Zhiwen, Guo, Lan, Han, Zeyu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2022
Materias:	Artificial Intelligence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9044334/ https://www.ncbi.nlm.nih.gov/pubmed/35494808 http://dx.doi.org/10.7717/peerj-cs.916

Ejemplares similares

Gamma and vega hedging using deep distributional reinforcement learning
por: Cao, Jay, et al.
Publicado: (2023)

Enhancing the robustness of vision transformer defense against adversarial attacks based on squeeze-and-excitation module
por: Chang, YouKang, et al.
Publicado: (2023)

Identifying the role of vision transformer for skin cancer—A scoping review
por: Khan, Sulaiman, et al.
Publicado: (2023)

A Unifying Framework for Reinforcement Learning and Planning
por: Moerland, Thomas M., et al.
Publicado: (2022)

Towards the portability of knowledge in reinforcement learning-based systems for automatic drone navigation
por: Barreiro, José M., et al.
Publicado: (2023)

What Does a Language-And-Vision Transformer See: The Impact of Semantic Information on Visual Representations
por: Ilinykh, Nikolai, et al.
Publicado: (2021)

Reactive navigation under a fuzzy rules-based scheme and reinforcement learning for mobile robots
por: López-Lozada, Elizabeth, et al.
Publicado: (2021)

Investigation of independent reinforcement learning algorithms in multi-agent environments
por: Lee, Ken Ming, et al.
Publicado: (2022)

A transformer-based deep learning framework to predict employee attrition
por: Li, Wenhui
Publicado: (2023)

Editorial: Deep learning with limited labeled data for vision, audio, and text
por: Orescanin, Marko, et al.
Publicado: (2023)

Demographics Prediction and Heatmap Generation From OCT Images of Anterior Segment of the Eye: A Vision Transformer Model Study
por: Lee, Yun Jeong, et al.
Publicado: (2022)

From ECG signals to images: a transformation based approach for deep learning
por: Naz, Mahwish, et al.
Publicado: (2021)

Heterogeneous mission planning for a single unmanned aerial vehicle (UAV) with attention-based deep reinforcement learning
por: Jung, Minjae, et al.
Publicado: (2022)

Explainable AI and Reinforcement Learning—A Systematic Review of Current Approaches and Trends
por: Wells, Lindsay, et al.
Publicado: (2021)

PhacoTrainer: Deep Learning for Cataract Surgical Videos to Track Surgical Tools
por: Yeh, Hsu-Hang, et al.
Publicado: (2023)

Impact of Box-Cox Transformation on Machine-Learning Algorithms
por: Blum, Luca, et al.
Publicado: (2022)

Application of Video-to-Video Translation Networks to Computational Fluid Dynamics
por: Kigure, Hiromitsu
Publicado: (2021)

Sketching the vision of the Web of Debates
por: Bikakis, Antonis, et al.
Publicado: (2023)

Supervised Learning Computer Vision Benchmark for Snake Species Identification From Photographs: Implications for Herpetology and Global Health
por: Durso, Andrew M., et al.
Publicado: (2021)

An autoencoder-based deep learning method for genotype imputation
por: Song, Meng, et al.
Publicado: (2022)

Challenges and Prospects in Vision and Language Research
por: Kafle, Kushal, et al.
Publicado: (2019)

Efficient video face recognition based on frame selection and quality assessment
por: Kharchevnikova, Angelina, et al.
Publicado: (2021)

Teaching Multiple Inverse Reinforcement Learners
por: Melo, Francisco S., et al.
Publicado: (2021)

A computer vision-based system for recognition and classification of Urdu sign language dataset
por: Zahid, Hira, et al.
Publicado: (2022)

Inside out: transforming images of lab-grown plants for machine learning applications in agriculture
por: Krosney, Alexander E., et al.
Publicado: (2023)

Online supervised attention-based recurrent depth estimation from monocular video
por: Maslov, Dmitrii, et al.
Publicado: (2020)

Editorial: Computer vision in plant phenotyping and agriculture
por: Stavness, Ian, et al.
Publicado: (2023)

Artificial Intelligence–Based Diagnostic Model for Detecting Keratoconus Using Videos of Corneal Force Deformation
por: Tan, Zuoping, et al.
Publicado: (2022)

QF-TraderNet: Intraday Trading via Deep Reinforcement With Quantum Price Levels Based Profit-And-Loss Control
por: Qiu, Yifu, et al.
Publicado: (2021)

Characterization of technologies in digital health applied in vision care
por: Stuermer, Leandro, et al.
Publicado: (2022)

An automatic system for extracting figure-caption pair from medical documents: a six-fold approach
por: Chaki, Jyotismita
Publicado: (2023)

Comparing Deep Learning Approaches for Understanding Genotype × Phenotype Interactions in Biomass Sorghum
por: Zhang, Zeyu, et al.
Publicado: (2022)

Efficient anomaly recognition using surveillance videos
por: Saleem, Gulshan, et al.
Publicado: (2022)

The interactive reading task: Transformer-based automatic item generation
por: Attali, Yigal, et al.
Publicado: (2022)

Interpreting vision and language generative models with semantic visual priors
por: Cafagna, Michele, et al.
Publicado: (2023)

Ultrasonic based concrete defects identification via wavelet packet transform and GA-BP neural network
por: Hu, Tianyu, et al.
Publicado: (2021)

Benefits of Adaptive Learning Transfer From Typing-Based Learning to Speech-Based Learning
por: Wilschut, Thomas, et al.
Publicado: (2021)

Semantic guidance network for video captioning
por: Guo, Lan, et al.
Publicado: (2023)

Improving Robotic Hand Prosthesis Control With Eye Tracking and Computer Vision: A Multimodal Approach Based on the Visuomotor Behavior of Grasping
por: Cognolato, Matteo, et al.
Publicado: (2022)

Deepfake video detection: YOLO-Face convolution recurrent approach
por: Ismail, Aya, et al.
Publicado: (2021)

Cannot write session to /tmp/vufind_sessions/sess_f4lfjl7qrojf2pcjstkheoonjg