Cargando…
Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition
In recent years, deep learning techniques have excelled in video action recognition. However, currently commonly used video action recognition models minimize the importance of different video frames and spatial regions within some specific frames when performing action recognition, which makes it d...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9919151/ https://www.ncbi.nlm.nih.gov/pubmed/36772770 http://dx.doi.org/10.3390/s23031707 |
_version_ | 1784886752848117760 |
---|---|
author | Chen, Bo Meng, Fangzhou Tang, Hongying Tong, Guanjun |
author_facet | Chen, Bo Meng, Fangzhou Tang, Hongying Tong, Guanjun |
author_sort | Chen, Bo |
collection | PubMed |
description | In recent years, deep learning techniques have excelled in video action recognition. However, currently commonly used video action recognition models minimize the importance of different video frames and spatial regions within some specific frames when performing action recognition, which makes it difficult for the models to adequately extract spatiotemporal features from the video data. In this paper, an action recognition method based on improved residual convolutional neural networks (CNNs) for video frames and spatial attention modules is proposed to address this problem. The network can guide what and where to emphasize or suppress with essentially little computational cost using the video frame attention module and the spatial attention module. It also employs a two-level attention module to emphasize feature information along the temporal and spatial dimensions, respectively, highlighting the more important frames in the overall video sequence and the more important spatial regions in some specific frames. Specifically, we create the video frame and spatial attention map by successively adding the video frame attention module and the spatial attention module to aggregate the spatial and temporal dimensions of the intermediate feature maps of the CNNs to obtain different feature descriptors, thus directing the network to focus more on important video frames and more contributing spatial regions. The experimental results further show that the network performs well on the UCF-101 and HMDB-51 datasets. |
format | Online Article Text |
id | pubmed-9919151 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-99191512023-02-12 Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition Chen, Bo Meng, Fangzhou Tang, Hongying Tong, Guanjun Sensors (Basel) Article In recent years, deep learning techniques have excelled in video action recognition. However, currently commonly used video action recognition models minimize the importance of different video frames and spatial regions within some specific frames when performing action recognition, which makes it difficult for the models to adequately extract spatiotemporal features from the video data. In this paper, an action recognition method based on improved residual convolutional neural networks (CNNs) for video frames and spatial attention modules is proposed to address this problem. The network can guide what and where to emphasize or suppress with essentially little computational cost using the video frame attention module and the spatial attention module. It also employs a two-level attention module to emphasize feature information along the temporal and spatial dimensions, respectively, highlighting the more important frames in the overall video sequence and the more important spatial regions in some specific frames. Specifically, we create the video frame and spatial attention map by successively adding the video frame attention module and the spatial attention module to aggregate the spatial and temporal dimensions of the intermediate feature maps of the CNNs to obtain different feature descriptors, thus directing the network to focus more on important video frames and more contributing spatial regions. The experimental results further show that the network performs well on the UCF-101 and HMDB-51 datasets. MDPI 2023-02-03 /pmc/articles/PMC9919151/ /pubmed/36772770 http://dx.doi.org/10.3390/s23031707 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Chen, Bo Meng, Fangzhou Tang, Hongying Tong, Guanjun Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition |
title | Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition |
title_full | Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition |
title_fullStr | Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition |
title_full_unstemmed | Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition |
title_short | Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition |
title_sort | two-level attention module based on spurious-3d residual networks for human action recognition |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9919151/ https://www.ncbi.nlm.nih.gov/pubmed/36772770 http://dx.doi.org/10.3390/s23031707 |
work_keys_str_mv | AT chenbo twolevelattentionmodulebasedonspurious3dresidualnetworksforhumanactionrecognition AT mengfangzhou twolevelattentionmodulebasedonspurious3dresidualnetworksforhumanactionrecognition AT tanghongying twolevelattentionmodulebasedonspurious3dresidualnetworksforhumanactionrecognition AT tongguanjun twolevelattentionmodulebasedonspurious3dresidualnetworksforhumanactionrecognition |