Cargando…

Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition

In recent years, deep learning techniques have excelled in video action recognition. However, currently commonly used video action recognition models minimize the importance of different video frames and spatial regions within some specific frames when performing action recognition, which makes it d...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Bo, Meng, Fangzhou, Tang, Hongying, Tong, Guanjun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9919151/
https://www.ncbi.nlm.nih.gov/pubmed/36772770
http://dx.doi.org/10.3390/s23031707
_version_ 1784886752848117760
author Chen, Bo
Meng, Fangzhou
Tang, Hongying
Tong, Guanjun
author_facet Chen, Bo
Meng, Fangzhou
Tang, Hongying
Tong, Guanjun
author_sort Chen, Bo
collection PubMed
description In recent years, deep learning techniques have excelled in video action recognition. However, currently commonly used video action recognition models minimize the importance of different video frames and spatial regions within some specific frames when performing action recognition, which makes it difficult for the models to adequately extract spatiotemporal features from the video data. In this paper, an action recognition method based on improved residual convolutional neural networks (CNNs) for video frames and spatial attention modules is proposed to address this problem. The network can guide what and where to emphasize or suppress with essentially little computational cost using the video frame attention module and the spatial attention module. It also employs a two-level attention module to emphasize feature information along the temporal and spatial dimensions, respectively, highlighting the more important frames in the overall video sequence and the more important spatial regions in some specific frames. Specifically, we create the video frame and spatial attention map by successively adding the video frame attention module and the spatial attention module to aggregate the spatial and temporal dimensions of the intermediate feature maps of the CNNs to obtain different feature descriptors, thus directing the network to focus more on important video frames and more contributing spatial regions. The experimental results further show that the network performs well on the UCF-101 and HMDB-51 datasets.
format Online
Article
Text
id pubmed-9919151
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-99191512023-02-12 Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition Chen, Bo Meng, Fangzhou Tang, Hongying Tong, Guanjun Sensors (Basel) Article In recent years, deep learning techniques have excelled in video action recognition. However, currently commonly used video action recognition models minimize the importance of different video frames and spatial regions within some specific frames when performing action recognition, which makes it difficult for the models to adequately extract spatiotemporal features from the video data. In this paper, an action recognition method based on improved residual convolutional neural networks (CNNs) for video frames and spatial attention modules is proposed to address this problem. The network can guide what and where to emphasize or suppress with essentially little computational cost using the video frame attention module and the spatial attention module. It also employs a two-level attention module to emphasize feature information along the temporal and spatial dimensions, respectively, highlighting the more important frames in the overall video sequence and the more important spatial regions in some specific frames. Specifically, we create the video frame and spatial attention map by successively adding the video frame attention module and the spatial attention module to aggregate the spatial and temporal dimensions of the intermediate feature maps of the CNNs to obtain different feature descriptors, thus directing the network to focus more on important video frames and more contributing spatial regions. The experimental results further show that the network performs well on the UCF-101 and HMDB-51 datasets. MDPI 2023-02-03 /pmc/articles/PMC9919151/ /pubmed/36772770 http://dx.doi.org/10.3390/s23031707 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Chen, Bo
Meng, Fangzhou
Tang, Hongying
Tong, Guanjun
Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition
title Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition
title_full Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition
title_fullStr Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition
title_full_unstemmed Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition
title_short Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition
title_sort two-level attention module based on spurious-3d residual networks for human action recognition
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9919151/
https://www.ncbi.nlm.nih.gov/pubmed/36772770
http://dx.doi.org/10.3390/s23031707
work_keys_str_mv AT chenbo twolevelattentionmodulebasedonspurious3dresidualnetworksforhumanactionrecognition
AT mengfangzhou twolevelattentionmodulebasedonspurious3dresidualnetworksforhumanactionrecognition
AT tanghongying twolevelattentionmodulebasedonspurious3dresidualnetworksforhumanactionrecognition
AT tongguanjun twolevelattentionmodulebasedonspurious3dresidualnetworksforhumanactionrecognition