Cargando…

Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction †

A “long short-term memory” (LSTM)-based human activity classifier is presented for skeleton data estimated in video frames. A strong feature engineering step precedes the deep neural network processing. The video was analyzed in short-time chunks created by a sliding window. A fixed number of video...

Descripción completa

Detalles Bibliográficos
Autores principales:	Puchała, Sebastian, Kasprzak, Włodzimierz, Piwowarski, Paweł
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10384121/ https://www.ncbi.nlm.nih.gov/pubmed/37514573 http://dx.doi.org/10.3390/s23146279

_version_	1785081079529472000
author	Puchała, Sebastian Kasprzak, Włodzimierz Piwowarski, Paweł
author_facet	Puchała, Sebastian Kasprzak, Włodzimierz Piwowarski, Paweł
author_sort	Puchała, Sebastian
collection	PubMed
description	A “long short-term memory” (LSTM)-based human activity classifier is presented for skeleton data estimated in video frames. A strong feature engineering step precedes the deep neural network processing. The video was analyzed in short-time chunks created by a sliding window. A fixed number of video frames was selected for every chunk and human skeletons were estimated using dedicated software, such as OpenPose or HRNet. The skeleton data for a given window were collected, analyzed, and eventually corrected. A knowledge-aware feature extraction from the corrected skeletons was performed. A deep network model was trained and applied for two-person interaction classification. Three network architectures were developed—single-, double- and triple-channel LSTM networks—and were experimentally evaluated on the interaction subset of the ”NTU RGB+D” data set. The most efficient model achieved an interaction classification accuracy of 96%. This performance was compared with the best reported solutions for this set, based on “adaptive graph convolutional networks” (AGCN) and “3D convolutional networks” (e.g., OpenConv3D). The sliding-window strategy was cross-validated on the ”UT-Interaction” data set, containing long video clips with many changing interactions. We concluded that a two-step approach to skeleton-based human activity classification (a skeleton feature engineering step followed by a deep neural network model) represents a practical tradeoff between accuracy and computational complexity, due to an early correction of imperfect skeleton data and a knowledge-aware extraction of relational features from the skeletons.
format	Online Article Text
id	pubmed-10384121
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-103841212023-07-30 Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction † Puchała, Sebastian Kasprzak, Włodzimierz Piwowarski, Paweł Sensors (Basel) Article A “long short-term memory” (LSTM)-based human activity classifier is presented for skeleton data estimated in video frames. A strong feature engineering step precedes the deep neural network processing. The video was analyzed in short-time chunks created by a sliding window. A fixed number of video frames was selected for every chunk and human skeletons were estimated using dedicated software, such as OpenPose or HRNet. The skeleton data for a given window were collected, analyzed, and eventually corrected. A knowledge-aware feature extraction from the corrected skeletons was performed. A deep network model was trained and applied for two-person interaction classification. Three network architectures were developed—single-, double- and triple-channel LSTM networks—and were experimentally evaluated on the interaction subset of the ”NTU RGB+D” data set. The most efficient model achieved an interaction classification accuracy of 96%. This performance was compared with the best reported solutions for this set, based on “adaptive graph convolutional networks” (AGCN) and “3D convolutional networks” (e.g., OpenConv3D). The sliding-window strategy was cross-validated on the ”UT-Interaction” data set, containing long video clips with many changing interactions. We concluded that a two-step approach to skeleton-based human activity classification (a skeleton feature engineering step followed by a deep neural network model) represents a practical tradeoff between accuracy and computational complexity, due to an early correction of imperfect skeleton data and a knowledge-aware extraction of relational features from the skeletons. MDPI 2023-07-10 /pmc/articles/PMC10384121/ /pubmed/37514573 http://dx.doi.org/10.3390/s23146279 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Puchała, Sebastian Kasprzak, Włodzimierz Piwowarski, Paweł Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction †
title	Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction †
title_full	Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction †
title_fullStr	Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction †
title_full_unstemmed	Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction †
title_short	Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction †
title_sort	human interaction classification in sliding video windows using skeleton data tracking and feature extraction †
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10384121/ https://www.ncbi.nlm.nih.gov/pubmed/37514573 http://dx.doi.org/10.3390/s23146279
work_keys_str_mv	AT puchałasebastian humaninteractionclassificationinslidingvideowindowsusingskeletondatatrackingandfeatureextraction AT kasprzakwłodzimierz humaninteractionclassificationinslidingvideowindowsusingskeletondatatrackingandfeatureextraction AT piwowarskipaweł humaninteractionclassificationinslidingvideowindowsusingskeletondatatrackingandfeatureextraction

Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction †

Ejemplares similares