Cargando…

An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos

We propose a novel video sampling scheme for human action recognition in videos, using Gaussian Weighing Function. Traditionally in deep learning-based human activity recognition approaches, either a few random frames or every k(th) frame of the video is considered for training the 3D CNN, where k i...

Descripción completa

Detalles Bibliográficos
Autores principales:	Basha, S. H. Shabbeer, Pulabaigari, Viswanath, Mukherjee, Snehasis
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer US 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9084266/ https://www.ncbi.nlm.nih.gov/pubmed/35572387 http://dx.doi.org/10.1007/s11042-022-12856-6

_version_	1784703574961291264
author	Basha, S. H. Shabbeer Pulabaigari, Viswanath Mukherjee, Snehasis
author_facet	Basha, S. H. Shabbeer Pulabaigari, Viswanath Mukherjee, Snehasis
author_sort	Basha, S. H. Shabbeer
collection	PubMed
description	We propose a novel video sampling scheme for human action recognition in videos, using Gaussian Weighing Function. Traditionally in deep learning-based human activity recognition approaches, either a few random frames or every k(th) frame of the video is considered for training the 3D CNN, where k is a small positive integer, like 4, 5, or 6. This kind of sampling reduces the volume of the input data, which speeds-up the training network and also avoids overfitting to some extent, thus enhancing the performance of the 3D CNN model. In the proposed video sampling technique, consecutive k frames of a video are aggregated into a single frame by computing a Gaussian-weighted summation of the k frames. The resulting frame preserves the information in a better way than the conventional approaches and experimentally shown to perform better. In this paper, a 3-Dimensional deep CNN is proposed to extract the spatio-temporal features and follows Long Short-Term Memory (LSTM) to recognize human actions. The proposed 3D CNN architecture is capable of handling the videos where the camera is placed at a distance from the performer. Experiments are performed with KTH, WEIZMANN, and CASIA-B Human Activity and Gait datasets, whereby it is shown to outperform state-of-the-art deep learning based techniques. We achieve 95.78%, 95.27%, and 95.27% over the KTH, WEIZMANN, and CASIA-B human action and gait recognition datasets, respectively.
format	Online Article Text
id	pubmed-9084266
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Springer US
record_format	MEDLINE/PubMed
spelling	pubmed-90842662022-05-10 An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos Basha, S. H. Shabbeer Pulabaigari, Viswanath Mukherjee, Snehasis Multimed Tools Appl Article We propose a novel video sampling scheme for human action recognition in videos, using Gaussian Weighing Function. Traditionally in deep learning-based human activity recognition approaches, either a few random frames or every k(th) frame of the video is considered for training the 3D CNN, where k is a small positive integer, like 4, 5, or 6. This kind of sampling reduces the volume of the input data, which speeds-up the training network and also avoids overfitting to some extent, thus enhancing the performance of the 3D CNN model. In the proposed video sampling technique, consecutive k frames of a video are aggregated into a single frame by computing a Gaussian-weighted summation of the k frames. The resulting frame preserves the information in a better way than the conventional approaches and experimentally shown to perform better. In this paper, a 3-Dimensional deep CNN is proposed to extract the spatio-temporal features and follows Long Short-Term Memory (LSTM) to recognize human actions. The proposed 3D CNN architecture is capable of handling the videos where the camera is placed at a distance from the performer. Experiments are performed with KTH, WEIZMANN, and CASIA-B Human Activity and Gait datasets, whereby it is shown to outperform state-of-the-art deep learning based techniques. We achieve 95.78%, 95.27%, and 95.27% over the KTH, WEIZMANN, and CASIA-B human action and gait recognition datasets, respectively. Springer US 2022-05-09 2022 /pmc/articles/PMC9084266/ /pubmed/35572387 http://dx.doi.org/10.1007/s11042-022-12856-6 Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Basha, S. H. Shabbeer Pulabaigari, Viswanath Mukherjee, Snehasis An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos
title	An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos
title_full	An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos
title_fullStr	An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos
title_full_unstemmed	An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos
title_short	An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos
title_sort	information-rich sampling technique over spatio-temporal cnn for classification of human actions in videos
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9084266/ https://www.ncbi.nlm.nih.gov/pubmed/35572387 http://dx.doi.org/10.1007/s11042-022-12856-6
work_keys_str_mv	AT bashashshabbeer aninformationrichsamplingtechniqueoverspatiotemporalcnnforclassificationofhumanactionsinvideos AT pulabaigariviswanath aninformationrichsamplingtechniqueoverspatiotemporalcnnforclassificationofhumanactionsinvideos AT mukherjeesnehasis aninformationrichsamplingtechniqueoverspatiotemporalcnnforclassificationofhumanactionsinvideos AT bashashshabbeer informationrichsamplingtechniqueoverspatiotemporalcnnforclassificationofhumanactionsinvideos AT pulabaigariviswanath informationrichsamplingtechniqueoverspatiotemporalcnnforclassificationofhumanactionsinvideos AT mukherjeesnehasis informationrichsamplingtechniqueoverspatiotemporalcnnforclassificationofhumanactionsinvideos

An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos

Ejemplares similares