Cargando…

Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework

Vision-based human activity recognition (HAR) has emerged as one of the essential research areas in video analytics. Over the last decade, numerous advanced deep learning algorithms have been introduced to recognize complex human actions from video streams. These deep learning algorithms have shown...

Descripción completa

Detalles Bibliográficos
Autores principales: Ullah, Hayat, Munir, Arslan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10381293/
https://www.ncbi.nlm.nih.gov/pubmed/37504807
http://dx.doi.org/10.3390/jimaging9070130
_version_ 1785080407839997952
author Ullah, Hayat
Munir, Arslan
author_facet Ullah, Hayat
Munir, Arslan
author_sort Ullah, Hayat
collection PubMed
description Vision-based human activity recognition (HAR) has emerged as one of the essential research areas in video analytics. Over the last decade, numerous advanced deep learning algorithms have been introduced to recognize complex human actions from video streams. These deep learning algorithms have shown impressive performance for the video analytics task. However, these newly introduced methods either exclusively focus on model performance or the effectiveness of these models in terms of computational efficiency, resulting in a biased trade-off between robustness and computational efficiency in their proposed methods to deal with challenging HAR problem. To enhance both the accuracy and computational efficiency, this paper presents a computationally efficient yet generic spatial–temporal cascaded framework that exploits the deep discriminative spatial and temporal features for HAR. For efficient representation of human actions, we propose an efficient dual attentional convolutional neural network (DA-CNN) architecture that leverages a unified channel–spatial attention mechanism to extract human-centric salient features in video frames. The dual channel–spatial attention layers together with the convolutional layers learn to be more selective in the spatial receptive fields having objects within the feature maps. The extracted discriminative salient features are then forwarded to a stacked bi-directional gated recurrent unit (Bi-GRU) for long-term temporal modeling and recognition of human actions using both forward and backward pass gradient learning. Extensive experiments are conducted on three publicly available human action datasets, where the obtained results verify the effectiveness of our proposed framework (DA-CNN+Bi-GRU) over the state-of-the-art methods in terms of model accuracy and inference runtime across each dataset. Experimental results show that the DA-CNN+Bi-GRU framework attains an improvement in execution time up to 167× in terms of frames per second as compared to most of the contemporary action-recognition methods.
format Online
Article
Text
id pubmed-10381293
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-103812932023-07-29 Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework Ullah, Hayat Munir, Arslan J Imaging Article Vision-based human activity recognition (HAR) has emerged as one of the essential research areas in video analytics. Over the last decade, numerous advanced deep learning algorithms have been introduced to recognize complex human actions from video streams. These deep learning algorithms have shown impressive performance for the video analytics task. However, these newly introduced methods either exclusively focus on model performance or the effectiveness of these models in terms of computational efficiency, resulting in a biased trade-off between robustness and computational efficiency in their proposed methods to deal with challenging HAR problem. To enhance both the accuracy and computational efficiency, this paper presents a computationally efficient yet generic spatial–temporal cascaded framework that exploits the deep discriminative spatial and temporal features for HAR. For efficient representation of human actions, we propose an efficient dual attentional convolutional neural network (DA-CNN) architecture that leverages a unified channel–spatial attention mechanism to extract human-centric salient features in video frames. The dual channel–spatial attention layers together with the convolutional layers learn to be more selective in the spatial receptive fields having objects within the feature maps. The extracted discriminative salient features are then forwarded to a stacked bi-directional gated recurrent unit (Bi-GRU) for long-term temporal modeling and recognition of human actions using both forward and backward pass gradient learning. Extensive experiments are conducted on three publicly available human action datasets, where the obtained results verify the effectiveness of our proposed framework (DA-CNN+Bi-GRU) over the state-of-the-art methods in terms of model accuracy and inference runtime across each dataset. Experimental results show that the DA-CNN+Bi-GRU framework attains an improvement in execution time up to 167× in terms of frames per second as compared to most of the contemporary action-recognition methods. MDPI 2023-06-26 /pmc/articles/PMC10381293/ /pubmed/37504807 http://dx.doi.org/10.3390/jimaging9070130 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ullah, Hayat
Munir, Arslan
Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework
title Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework
title_full Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework
title_fullStr Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework
title_full_unstemmed Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework
title_short Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework
title_sort human activity recognition using cascaded dual attention cnn and bi-directional gru framework
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10381293/
https://www.ncbi.nlm.nih.gov/pubmed/37504807
http://dx.doi.org/10.3390/jimaging9070130
work_keys_str_mv AT ullahhayat humanactivityrecognitionusingcascadeddualattentioncnnandbidirectionalgruframework
AT munirarslan humanactivityrecognitionusingcascadeddualattentioncnnandbidirectionalgruframework