Cargando…

Video action recognition collaborative learning with dynamics via PSO-ConvNet Transformer

Recognizing human actions in video sequences, known as Human Action Recognition (HAR), is a challenging task in pattern recognition. While Convolutional Neural Networks (ConvNets) have shown remarkable success in image recognition, they are not always directly applicable to HAR, as temporal features...

Descripción completa

Detalles Bibliográficos
Autores principales:	Nguyen, Huu Phong, Ribeiro, Bernardete
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10480209/ https://www.ncbi.nlm.nih.gov/pubmed/37670019 http://dx.doi.org/10.1038/s41598-023-39744-9

_version_	1785101742789099520
author	Nguyen, Huu Phong Ribeiro, Bernardete
author_facet	Nguyen, Huu Phong Ribeiro, Bernardete
author_sort	Nguyen, Huu Phong
collection	PubMed
description	Recognizing human actions in video sequences, known as Human Action Recognition (HAR), is a challenging task in pattern recognition. While Convolutional Neural Networks (ConvNets) have shown remarkable success in image recognition, they are not always directly applicable to HAR, as temporal features are critical for accurate classification. In this paper, we propose a novel dynamic PSO-ConvNet model for learning actions in videos, building on our recent work in image recognition. Our approach leverages a framework where the weight vector of each neural network represents the position of a particle in phase space, and particles share their current weight vectors and gradient estimates of the Loss function. To extend our approach to video, we integrate ConvNets with state-of-the-art temporal methods such as Transformer and Recurrent Neural Networks. Our experimental results on the UCF-101 dataset demonstrate substantial improvements of up to 9% in accuracy, which confirms the effectiveness of our proposed method. In addition, we conducted experiments on larger and more variety of datasets including Kinetics-400 and HMDB-51 and obtained preference for Collaborative Learning in comparison with Non-Collaborative Learning (Individual Learning). Overall, our dynamic PSO-ConvNet model provides a promising direction for improving HAR by better capturing the spatio-temporal dynamics of human actions in videos. The code is available at https://github.com/leonlha/Video-Action-Recognition-Collaborative-Learning-with-Dynamics-via-PSO-ConvNet-Transformer.
format	Online Article Text
id	pubmed-10480209
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-104802092023-09-07 Video action recognition collaborative learning with dynamics via PSO-ConvNet Transformer Nguyen, Huu Phong Ribeiro, Bernardete Sci Rep Article Recognizing human actions in video sequences, known as Human Action Recognition (HAR), is a challenging task in pattern recognition. While Convolutional Neural Networks (ConvNets) have shown remarkable success in image recognition, they are not always directly applicable to HAR, as temporal features are critical for accurate classification. In this paper, we propose a novel dynamic PSO-ConvNet model for learning actions in videos, building on our recent work in image recognition. Our approach leverages a framework where the weight vector of each neural network represents the position of a particle in phase space, and particles share their current weight vectors and gradient estimates of the Loss function. To extend our approach to video, we integrate ConvNets with state-of-the-art temporal methods such as Transformer and Recurrent Neural Networks. Our experimental results on the UCF-101 dataset demonstrate substantial improvements of up to 9% in accuracy, which confirms the effectiveness of our proposed method. In addition, we conducted experiments on larger and more variety of datasets including Kinetics-400 and HMDB-51 and obtained preference for Collaborative Learning in comparison with Non-Collaborative Learning (Individual Learning). Overall, our dynamic PSO-ConvNet model provides a promising direction for improving HAR by better capturing the spatio-temporal dynamics of human actions in videos. The code is available at https://github.com/leonlha/Video-Action-Recognition-Collaborative-Learning-with-Dynamics-via-PSO-ConvNet-Transformer. Nature Publishing Group UK 2023-09-05 /pmc/articles/PMC10480209/ /pubmed/37670019 http://dx.doi.org/10.1038/s41598-023-39744-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Nguyen, Huu Phong Ribeiro, Bernardete Video action recognition collaborative learning with dynamics via PSO-ConvNet Transformer
title	Video action recognition collaborative learning with dynamics via PSO-ConvNet Transformer
title_full	Video action recognition collaborative learning with dynamics via PSO-ConvNet Transformer
title_fullStr	Video action recognition collaborative learning with dynamics via PSO-ConvNet Transformer
title_full_unstemmed	Video action recognition collaborative learning with dynamics via PSO-ConvNet Transformer
title_short	Video action recognition collaborative learning with dynamics via PSO-ConvNet Transformer
title_sort	video action recognition collaborative learning with dynamics via pso-convnet transformer
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10480209/ https://www.ncbi.nlm.nih.gov/pubmed/37670019 http://dx.doi.org/10.1038/s41598-023-39744-9
work_keys_str_mv	AT nguyenhuuphong videoactionrecognitioncollaborativelearningwithdynamicsviapsoconvnettransformer AT ribeirobernardete videoactionrecognitioncollaborativelearningwithdynamicsviapsoconvnettransformer

Video action recognition collaborative learning with dynamics via PSO-ConvNet Transformer

Ejemplares similares