Cargando…

An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos

We propose a novel video sampling scheme for human action recognition in videos, using Gaussian Weighing Function. Traditionally in deep learning-based human activity recognition approaches, either a few random frames or every k(th) frame of the video is considered for training the 3D CNN, where k i...

Descripción completa

Detalles Bibliográficos
Autores principales: Basha, S. H. Shabbeer, Pulabaigari, Viswanath, Mukherjee, Snehasis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9084266/
https://www.ncbi.nlm.nih.gov/pubmed/35572387
http://dx.doi.org/10.1007/s11042-022-12856-6
_version_ 1784703574961291264
author Basha, S. H. Shabbeer
Pulabaigari, Viswanath
Mukherjee, Snehasis
author_facet Basha, S. H. Shabbeer
Pulabaigari, Viswanath
Mukherjee, Snehasis
author_sort Basha, S. H. Shabbeer
collection PubMed
description We propose a novel video sampling scheme for human action recognition in videos, using Gaussian Weighing Function. Traditionally in deep learning-based human activity recognition approaches, either a few random frames or every k(th) frame of the video is considered for training the 3D CNN, where k is a small positive integer, like 4, 5, or 6. This kind of sampling reduces the volume of the input data, which speeds-up the training network and also avoids overfitting to some extent, thus enhancing the performance of the 3D CNN model. In the proposed video sampling technique, consecutive k frames of a video are aggregated into a single frame by computing a Gaussian-weighted summation of the k frames. The resulting frame preserves the information in a better way than the conventional approaches and experimentally shown to perform better. In this paper, a 3-Dimensional deep CNN is proposed to extract the spatio-temporal features and follows Long Short-Term Memory (LSTM) to recognize human actions. The proposed 3D CNN architecture is capable of handling the videos where the camera is placed at a distance from the performer. Experiments are performed with KTH, WEIZMANN, and CASIA-B Human Activity and Gait datasets, whereby it is shown to outperform state-of-the-art deep learning based techniques. We achieve 95.78%, 95.27%, and 95.27% over the KTH, WEIZMANN, and CASIA-B human action and gait recognition datasets, respectively.
format Online
Article
Text
id pubmed-9084266
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-90842662022-05-10 An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos Basha, S. H. Shabbeer Pulabaigari, Viswanath Mukherjee, Snehasis Multimed Tools Appl Article We propose a novel video sampling scheme for human action recognition in videos, using Gaussian Weighing Function. Traditionally in deep learning-based human activity recognition approaches, either a few random frames or every k(th) frame of the video is considered for training the 3D CNN, where k is a small positive integer, like 4, 5, or 6. This kind of sampling reduces the volume of the input data, which speeds-up the training network and also avoids overfitting to some extent, thus enhancing the performance of the 3D CNN model. In the proposed video sampling technique, consecutive k frames of a video are aggregated into a single frame by computing a Gaussian-weighted summation of the k frames. The resulting frame preserves the information in a better way than the conventional approaches and experimentally shown to perform better. In this paper, a 3-Dimensional deep CNN is proposed to extract the spatio-temporal features and follows Long Short-Term Memory (LSTM) to recognize human actions. The proposed 3D CNN architecture is capable of handling the videos where the camera is placed at a distance from the performer. Experiments are performed with KTH, WEIZMANN, and CASIA-B Human Activity and Gait datasets, whereby it is shown to outperform state-of-the-art deep learning based techniques. We achieve 95.78%, 95.27%, and 95.27% over the KTH, WEIZMANN, and CASIA-B human action and gait recognition datasets, respectively. Springer US 2022-05-09 2022 /pmc/articles/PMC9084266/ /pubmed/35572387 http://dx.doi.org/10.1007/s11042-022-12856-6 Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Basha, S. H. Shabbeer
Pulabaigari, Viswanath
Mukherjee, Snehasis
An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos
title An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos
title_full An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos
title_fullStr An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos
title_full_unstemmed An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos
title_short An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos
title_sort information-rich sampling technique over spatio-temporal cnn for classification of human actions in videos
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9084266/
https://www.ncbi.nlm.nih.gov/pubmed/35572387
http://dx.doi.org/10.1007/s11042-022-12856-6
work_keys_str_mv AT bashashshabbeer aninformationrichsamplingtechniqueoverspatiotemporalcnnforclassificationofhumanactionsinvideos
AT pulabaigariviswanath aninformationrichsamplingtechniqueoverspatiotemporalcnnforclassificationofhumanactionsinvideos
AT mukherjeesnehasis aninformationrichsamplingtechniqueoverspatiotemporalcnnforclassificationofhumanactionsinvideos
AT bashashshabbeer informationrichsamplingtechniqueoverspatiotemporalcnnforclassificationofhumanactionsinvideos
AT pulabaigariviswanath informationrichsamplingtechniqueoverspatiotemporalcnnforclassificationofhumanactionsinvideos
AT mukherjeesnehasis informationrichsamplingtechniqueoverspatiotemporalcnnforclassificationofhumanactionsinvideos