Cargando…
Video captioning based on vision transformer and reinforcement learning
Global encoding of visual features in video captioning is important for improving the description accuracy. In this paper, we propose a video captioning method that combines Vision Transformer (ViT) and reinforcement learning. Firstly, Resnet-152 and ResNeXt-101 are used to extract features from vid...
Autores principales: | Zhao, Hong, Chen, Zhiwen, Guo, Lan, Han, Zeyu |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9044334/ https://www.ncbi.nlm.nih.gov/pubmed/35494808 http://dx.doi.org/10.7717/peerj-cs.916 |
Ejemplares similares
-
Gamma and vega hedging using deep distributional reinforcement learning
por: Cao, Jay, et al.
Publicado: (2023) -
Enhancing the robustness of vision transformer defense against adversarial attacks based on squeeze-and-excitation module
por: Chang, YouKang, et al.
Publicado: (2023) -
Identifying the role of vision transformer for skin cancer—A scoping review
por: Khan, Sulaiman, et al.
Publicado: (2023) -
A Unifying Framework for Reinforcement Learning and Planning
por: Moerland, Thomas M., et al.
Publicado: (2022) -
Towards the portability of knowledge in reinforcement learning-based systems for automatic drone navigation
por: Barreiro, José M., et al.
Publicado: (2023)