Cargando…

A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset

Depth video sequence-based deep models for recognizing human actions are scarce compared to RGB and skeleton video sequences-based models. This scarcity limits the research advancements based on depth data, as training deep models with small-scale data is challenging. In this work, we propose a sequ...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bulbul, Mohammad Farhad, Ullah, Amin, Ali, Hazrat, Kim, Daijin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9506565/ https://www.ncbi.nlm.nih.gov/pubmed/36146186 http://dx.doi.org/10.3390/s22186841

_version_	1784796754542067712
author	Bulbul, Mohammad Farhad Ullah, Amin Ali, Hazrat Kim, Daijin
author_facet	Bulbul, Mohammad Farhad Ullah, Amin Ali, Hazrat Kim, Daijin
author_sort	Bulbul, Mohammad Farhad
collection	PubMed
description	Depth video sequence-based deep models for recognizing human actions are scarce compared to RGB and skeleton video sequences-based models. This scarcity limits the research advancements based on depth data, as training deep models with small-scale data is challenging. In this work, we propose a sequence classification deep model using depth video data for scenarios when the video data are limited. Unlike summarizing the frame contents of each frame into a single class, our method can directly classify a depth video, i.e., a sequence of depth frames. Firstly, the proposed system transforms an input depth video into three sequences of multi-view temporal motion frames. Together with the three temporal motion sequences, the input depth frame sequence offers a four-stream representation of the input depth action video. Next, the DenseNet121 architecture is employed along with ImageNet pre-trained weights to extract the discriminating frame-level action features of depth and temporal motion frames. The extracted four sets of feature vectors about frames of four streams are fed into four bi-directional (BLSTM) networks. The temporal features are further analyzed through multi-head self-attention (MHSA) to capture multi-view sequence correlations. Finally, the concatenated genre of their outputs is processed through dense layers to classify the input depth video. The experimental results on two small-scale benchmark depth datasets, MSRAction3D and DHA, demonstrate that the proposed framework is efficacious even for insufficient training samples and superior to the existing depth data-based action recognition methods.
format	Online Article Text
id	pubmed-9506565
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-95065652022-09-24 A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset Bulbul, Mohammad Farhad Ullah, Amin Ali, Hazrat Kim, Daijin Sensors (Basel) Article Depth video sequence-based deep models for recognizing human actions are scarce compared to RGB and skeleton video sequences-based models. This scarcity limits the research advancements based on depth data, as training deep models with small-scale data is challenging. In this work, we propose a sequence classification deep model using depth video data for scenarios when the video data are limited. Unlike summarizing the frame contents of each frame into a single class, our method can directly classify a depth video, i.e., a sequence of depth frames. Firstly, the proposed system transforms an input depth video into three sequences of multi-view temporal motion frames. Together with the three temporal motion sequences, the input depth frame sequence offers a four-stream representation of the input depth action video. Next, the DenseNet121 architecture is employed along with ImageNet pre-trained weights to extract the discriminating frame-level action features of depth and temporal motion frames. The extracted four sets of feature vectors about frames of four streams are fed into four bi-directional (BLSTM) networks. The temporal features are further analyzed through multi-head self-attention (MHSA) to capture multi-view sequence correlations. Finally, the concatenated genre of their outputs is processed through dense layers to classify the input depth video. The experimental results on two small-scale benchmark depth datasets, MSRAction3D and DHA, demonstrate that the proposed framework is efficacious even for insufficient training samples and superior to the existing depth data-based action recognition methods. MDPI 2022-09-09 /pmc/articles/PMC9506565/ /pubmed/36146186 http://dx.doi.org/10.3390/s22186841 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Bulbul, Mohammad Farhad Ullah, Amin Ali, Hazrat Kim, Daijin A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset
title	A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset
title_full	A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset
title_fullStr	A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset
title_full_unstemmed	A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset
title_short	A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset
title_sort	deep sequence learning framework for action recognition in small-scale depth video dataset
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9506565/ https://www.ncbi.nlm.nih.gov/pubmed/36146186 http://dx.doi.org/10.3390/s22186841
work_keys_str_mv	AT bulbulmohammadfarhad adeepsequencelearningframeworkforactionrecognitioninsmallscaledepthvideodataset AT ullahamin adeepsequencelearningframeworkforactionrecognitioninsmallscaledepthvideodataset AT alihazrat adeepsequencelearningframeworkforactionrecognitioninsmallscaledepthvideodataset AT kimdaijin adeepsequencelearningframeworkforactionrecognitioninsmallscaledepthvideodataset AT bulbulmohammadfarhad deepsequencelearningframeworkforactionrecognitioninsmallscaledepthvideodataset AT ullahamin deepsequencelearningframeworkforactionrecognitioninsmallscaledepthvideodataset AT alihazrat deepsequencelearningframeworkforactionrecognitioninsmallscaledepthvideodataset AT kimdaijin deepsequencelearningframeworkforactionrecognitioninsmallscaledepthvideodataset

A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset

Ejemplares similares