Cargando…
An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos
We propose a novel video sampling scheme for human action recognition in videos, using Gaussian Weighing Function. Traditionally in deep learning-based human activity recognition approaches, either a few random frames or every k(th) frame of the video is considered for training the 3D CNN, where k i...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer US
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9084266/ https://www.ncbi.nlm.nih.gov/pubmed/35572387 http://dx.doi.org/10.1007/s11042-022-12856-6 |
_version_ | 1784703574961291264 |
---|---|
author | Basha, S. H. Shabbeer Pulabaigari, Viswanath Mukherjee, Snehasis |
author_facet | Basha, S. H. Shabbeer Pulabaigari, Viswanath Mukherjee, Snehasis |
author_sort | Basha, S. H. Shabbeer |
collection | PubMed |
description | We propose a novel video sampling scheme for human action recognition in videos, using Gaussian Weighing Function. Traditionally in deep learning-based human activity recognition approaches, either a few random frames or every k(th) frame of the video is considered for training the 3D CNN, where k is a small positive integer, like 4, 5, or 6. This kind of sampling reduces the volume of the input data, which speeds-up the training network and also avoids overfitting to some extent, thus enhancing the performance of the 3D CNN model. In the proposed video sampling technique, consecutive k frames of a video are aggregated into a single frame by computing a Gaussian-weighted summation of the k frames. The resulting frame preserves the information in a better way than the conventional approaches and experimentally shown to perform better. In this paper, a 3-Dimensional deep CNN is proposed to extract the spatio-temporal features and follows Long Short-Term Memory (LSTM) to recognize human actions. The proposed 3D CNN architecture is capable of handling the videos where the camera is placed at a distance from the performer. Experiments are performed with KTH, WEIZMANN, and CASIA-B Human Activity and Gait datasets, whereby it is shown to outperform state-of-the-art deep learning based techniques. We achieve 95.78%, 95.27%, and 95.27% over the KTH, WEIZMANN, and CASIA-B human action and gait recognition datasets, respectively. |
format | Online Article Text |
id | pubmed-9084266 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Springer US |
record_format | MEDLINE/PubMed |
spelling | pubmed-90842662022-05-10 An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos Basha, S. H. Shabbeer Pulabaigari, Viswanath Mukherjee, Snehasis Multimed Tools Appl Article We propose a novel video sampling scheme for human action recognition in videos, using Gaussian Weighing Function. Traditionally in deep learning-based human activity recognition approaches, either a few random frames or every k(th) frame of the video is considered for training the 3D CNN, where k is a small positive integer, like 4, 5, or 6. This kind of sampling reduces the volume of the input data, which speeds-up the training network and also avoids overfitting to some extent, thus enhancing the performance of the 3D CNN model. In the proposed video sampling technique, consecutive k frames of a video are aggregated into a single frame by computing a Gaussian-weighted summation of the k frames. The resulting frame preserves the information in a better way than the conventional approaches and experimentally shown to perform better. In this paper, a 3-Dimensional deep CNN is proposed to extract the spatio-temporal features and follows Long Short-Term Memory (LSTM) to recognize human actions. The proposed 3D CNN architecture is capable of handling the videos where the camera is placed at a distance from the performer. Experiments are performed with KTH, WEIZMANN, and CASIA-B Human Activity and Gait datasets, whereby it is shown to outperform state-of-the-art deep learning based techniques. We achieve 95.78%, 95.27%, and 95.27% over the KTH, WEIZMANN, and CASIA-B human action and gait recognition datasets, respectively. Springer US 2022-05-09 2022 /pmc/articles/PMC9084266/ /pubmed/35572387 http://dx.doi.org/10.1007/s11042-022-12856-6 Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Basha, S. H. Shabbeer Pulabaigari, Viswanath Mukherjee, Snehasis An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos |
title | An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos |
title_full | An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos |
title_fullStr | An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos |
title_full_unstemmed | An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos |
title_short | An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos |
title_sort | information-rich sampling technique over spatio-temporal cnn for classification of human actions in videos |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9084266/ https://www.ncbi.nlm.nih.gov/pubmed/35572387 http://dx.doi.org/10.1007/s11042-022-12856-6 |
work_keys_str_mv | AT bashashshabbeer aninformationrichsamplingtechniqueoverspatiotemporalcnnforclassificationofhumanactionsinvideos AT pulabaigariviswanath aninformationrichsamplingtechniqueoverspatiotemporalcnnforclassificationofhumanactionsinvideos AT mukherjeesnehasis aninformationrichsamplingtechniqueoverspatiotemporalcnnforclassificationofhumanactionsinvideos AT bashashshabbeer informationrichsamplingtechniqueoverspatiotemporalcnnforclassificationofhumanactionsinvideos AT pulabaigariviswanath informationrichsamplingtechniqueoverspatiotemporalcnnforclassificationofhumanactionsinvideos AT mukherjeesnehasis informationrichsamplingtechniqueoverspatiotemporalcnnforclassificationofhumanactionsinvideos |