Cargando…

Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition

In recent years, deep learning techniques have excelled in video action recognition. However, currently commonly used video action recognition models minimize the importance of different video frames and spatial regions within some specific frames when performing action recognition, which makes it d...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Bo, Meng, Fangzhou, Tang, Hongying, Tong, Guanjun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9919151/ https://www.ncbi.nlm.nih.gov/pubmed/36772770 http://dx.doi.org/10.3390/s23031707

_version_	1784886752848117760
author	Chen, Bo Meng, Fangzhou Tang, Hongying Tong, Guanjun
author_facet	Chen, Bo Meng, Fangzhou Tang, Hongying Tong, Guanjun
author_sort	Chen, Bo
collection	PubMed
description	In recent years, deep learning techniques have excelled in video action recognition. However, currently commonly used video action recognition models minimize the importance of different video frames and spatial regions within some specific frames when performing action recognition, which makes it difficult for the models to adequately extract spatiotemporal features from the video data. In this paper, an action recognition method based on improved residual convolutional neural networks (CNNs) for video frames and spatial attention modules is proposed to address this problem. The network can guide what and where to emphasize or suppress with essentially little computational cost using the video frame attention module and the spatial attention module. It also employs a two-level attention module to emphasize feature information along the temporal and spatial dimensions, respectively, highlighting the more important frames in the overall video sequence and the more important spatial regions in some specific frames. Specifically, we create the video frame and spatial attention map by successively adding the video frame attention module and the spatial attention module to aggregate the spatial and temporal dimensions of the intermediate feature maps of the CNNs to obtain different feature descriptors, thus directing the network to focus more on important video frames and more contributing spatial regions. The experimental results further show that the network performs well on the UCF-101 and HMDB-51 datasets.
format	Online Article Text
id	pubmed-9919151
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-99191512023-02-12 Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition Chen, Bo Meng, Fangzhou Tang, Hongying Tong, Guanjun Sensors (Basel) Article In recent years, deep learning techniques have excelled in video action recognition. However, currently commonly used video action recognition models minimize the importance of different video frames and spatial regions within some specific frames when performing action recognition, which makes it difficult for the models to adequately extract spatiotemporal features from the video data. In this paper, an action recognition method based on improved residual convolutional neural networks (CNNs) for video frames and spatial attention modules is proposed to address this problem. The network can guide what and where to emphasize or suppress with essentially little computational cost using the video frame attention module and the spatial attention module. It also employs a two-level attention module to emphasize feature information along the temporal and spatial dimensions, respectively, highlighting the more important frames in the overall video sequence and the more important spatial regions in some specific frames. Specifically, we create the video frame and spatial attention map by successively adding the video frame attention module and the spatial attention module to aggregate the spatial and temporal dimensions of the intermediate feature maps of the CNNs to obtain different feature descriptors, thus directing the network to focus more on important video frames and more contributing spatial regions. The experimental results further show that the network performs well on the UCF-101 and HMDB-51 datasets. MDPI 2023-02-03 /pmc/articles/PMC9919151/ /pubmed/36772770 http://dx.doi.org/10.3390/s23031707 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Chen, Bo Meng, Fangzhou Tang, Hongying Tong, Guanjun Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition
title	Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition
title_full	Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition
title_fullStr	Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition
title_full_unstemmed	Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition
title_short	Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition
title_sort	two-level attention module based on spurious-3d residual networks for human action recognition
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9919151/ https://www.ncbi.nlm.nih.gov/pubmed/36772770 http://dx.doi.org/10.3390/s23031707
work_keys_str_mv	AT chenbo twolevelattentionmodulebasedonspurious3dresidualnetworksforhumanactionrecognition AT mengfangzhou twolevelattentionmodulebasedonspurious3dresidualnetworksforhumanactionrecognition AT tanghongying twolevelattentionmodulebasedonspurious3dresidualnetworksforhumanactionrecognition AT tongguanjun twolevelattentionmodulebasedonspurious3dresidualnetworksforhumanactionrecognition

Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition

Ejemplares similares