Cargando…

Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction †

A “long short-term memory” (LSTM)-based human activity classifier is presented for skeleton data estimated in video frames. A strong feature engineering step precedes the deep neural network processing. The video was analyzed in short-time chunks created by a sliding window. A fixed number of video...

Descripción completa

Detalles Bibliográficos
Autores principales: Puchała, Sebastian, Kasprzak, Włodzimierz, Piwowarski, Paweł
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10384121/
https://www.ncbi.nlm.nih.gov/pubmed/37514573
http://dx.doi.org/10.3390/s23146279
_version_ 1785081079529472000
author Puchała, Sebastian
Kasprzak, Włodzimierz
Piwowarski, Paweł
author_facet Puchała, Sebastian
Kasprzak, Włodzimierz
Piwowarski, Paweł
author_sort Puchała, Sebastian
collection PubMed
description A “long short-term memory” (LSTM)-based human activity classifier is presented for skeleton data estimated in video frames. A strong feature engineering step precedes the deep neural network processing. The video was analyzed in short-time chunks created by a sliding window. A fixed number of video frames was selected for every chunk and human skeletons were estimated using dedicated software, such as OpenPose or HRNet. The skeleton data for a given window were collected, analyzed, and eventually corrected. A knowledge-aware feature extraction from the corrected skeletons was performed. A deep network model was trained and applied for two-person interaction classification. Three network architectures were developed—single-, double- and triple-channel LSTM networks—and were experimentally evaluated on the interaction subset of the ”NTU RGB+D” data set. The most efficient model achieved an interaction classification accuracy of 96%. This performance was compared with the best reported solutions for this set, based on “adaptive graph convolutional networks” (AGCN) and “3D convolutional networks” (e.g., OpenConv3D). The sliding-window strategy was cross-validated on the ”UT-Interaction” data set, containing long video clips with many changing interactions. We concluded that a two-step approach to skeleton-based human activity classification (a skeleton feature engineering step followed by a deep neural network model) represents a practical tradeoff between accuracy and computational complexity, due to an early correction of imperfect skeleton data and a knowledge-aware extraction of relational features from the skeletons.
format Online
Article
Text
id pubmed-10384121
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-103841212023-07-30 Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction † Puchała, Sebastian Kasprzak, Włodzimierz Piwowarski, Paweł Sensors (Basel) Article A “long short-term memory” (LSTM)-based human activity classifier is presented for skeleton data estimated in video frames. A strong feature engineering step precedes the deep neural network processing. The video was analyzed in short-time chunks created by a sliding window. A fixed number of video frames was selected for every chunk and human skeletons were estimated using dedicated software, such as OpenPose or HRNet. The skeleton data for a given window were collected, analyzed, and eventually corrected. A knowledge-aware feature extraction from the corrected skeletons was performed. A deep network model was trained and applied for two-person interaction classification. Three network architectures were developed—single-, double- and triple-channel LSTM networks—and were experimentally evaluated on the interaction subset of the ”NTU RGB+D” data set. The most efficient model achieved an interaction classification accuracy of 96%. This performance was compared with the best reported solutions for this set, based on “adaptive graph convolutional networks” (AGCN) and “3D convolutional networks” (e.g., OpenConv3D). The sliding-window strategy was cross-validated on the ”UT-Interaction” data set, containing long video clips with many changing interactions. We concluded that a two-step approach to skeleton-based human activity classification (a skeleton feature engineering step followed by a deep neural network model) represents a practical tradeoff between accuracy and computational complexity, due to an early correction of imperfect skeleton data and a knowledge-aware extraction of relational features from the skeletons. MDPI 2023-07-10 /pmc/articles/PMC10384121/ /pubmed/37514573 http://dx.doi.org/10.3390/s23146279 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Puchała, Sebastian
Kasprzak, Włodzimierz
Piwowarski, Paweł
Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction †
title Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction †
title_full Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction †
title_fullStr Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction †
title_full_unstemmed Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction †
title_short Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction †
title_sort human interaction classification in sliding video windows using skeleton data tracking and feature extraction †
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10384121/
https://www.ncbi.nlm.nih.gov/pubmed/37514573
http://dx.doi.org/10.3390/s23146279
work_keys_str_mv AT puchałasebastian humaninteractionclassificationinslidingvideowindowsusingskeletondatatrackingandfeatureextraction
AT kasprzakwłodzimierz humaninteractionclassificationinslidingvideowindowsusingskeletondatatrackingandfeatureextraction
AT piwowarskipaweł humaninteractionclassificationinslidingvideowindowsusingskeletondatatrackingandfeatureextraction