Cargando…
Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction †
A “long short-term memory” (LSTM)-based human activity classifier is presented for skeleton data estimated in video frames. A strong feature engineering step precedes the deep neural network processing. The video was analyzed in short-time chunks created by a sliding window. A fixed number of video...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10384121/ https://www.ncbi.nlm.nih.gov/pubmed/37514573 http://dx.doi.org/10.3390/s23146279 |
_version_ | 1785081079529472000 |
---|---|
author | Puchała, Sebastian Kasprzak, Włodzimierz Piwowarski, Paweł |
author_facet | Puchała, Sebastian Kasprzak, Włodzimierz Piwowarski, Paweł |
author_sort | Puchała, Sebastian |
collection | PubMed |
description | A “long short-term memory” (LSTM)-based human activity classifier is presented for skeleton data estimated in video frames. A strong feature engineering step precedes the deep neural network processing. The video was analyzed in short-time chunks created by a sliding window. A fixed number of video frames was selected for every chunk and human skeletons were estimated using dedicated software, such as OpenPose or HRNet. The skeleton data for a given window were collected, analyzed, and eventually corrected. A knowledge-aware feature extraction from the corrected skeletons was performed. A deep network model was trained and applied for two-person interaction classification. Three network architectures were developed—single-, double- and triple-channel LSTM networks—and were experimentally evaluated on the interaction subset of the ”NTU RGB+D” data set. The most efficient model achieved an interaction classification accuracy of 96%. This performance was compared with the best reported solutions for this set, based on “adaptive graph convolutional networks” (AGCN) and “3D convolutional networks” (e.g., OpenConv3D). The sliding-window strategy was cross-validated on the ”UT-Interaction” data set, containing long video clips with many changing interactions. We concluded that a two-step approach to skeleton-based human activity classification (a skeleton feature engineering step followed by a deep neural network model) represents a practical tradeoff between accuracy and computational complexity, due to an early correction of imperfect skeleton data and a knowledge-aware extraction of relational features from the skeletons. |
format | Online Article Text |
id | pubmed-10384121 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-103841212023-07-30 Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction † Puchała, Sebastian Kasprzak, Włodzimierz Piwowarski, Paweł Sensors (Basel) Article A “long short-term memory” (LSTM)-based human activity classifier is presented for skeleton data estimated in video frames. A strong feature engineering step precedes the deep neural network processing. The video was analyzed in short-time chunks created by a sliding window. A fixed number of video frames was selected for every chunk and human skeletons were estimated using dedicated software, such as OpenPose or HRNet. The skeleton data for a given window were collected, analyzed, and eventually corrected. A knowledge-aware feature extraction from the corrected skeletons was performed. A deep network model was trained and applied for two-person interaction classification. Three network architectures were developed—single-, double- and triple-channel LSTM networks—and were experimentally evaluated on the interaction subset of the ”NTU RGB+D” data set. The most efficient model achieved an interaction classification accuracy of 96%. This performance was compared with the best reported solutions for this set, based on “adaptive graph convolutional networks” (AGCN) and “3D convolutional networks” (e.g., OpenConv3D). The sliding-window strategy was cross-validated on the ”UT-Interaction” data set, containing long video clips with many changing interactions. We concluded that a two-step approach to skeleton-based human activity classification (a skeleton feature engineering step followed by a deep neural network model) represents a practical tradeoff between accuracy and computational complexity, due to an early correction of imperfect skeleton data and a knowledge-aware extraction of relational features from the skeletons. MDPI 2023-07-10 /pmc/articles/PMC10384121/ /pubmed/37514573 http://dx.doi.org/10.3390/s23146279 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Puchała, Sebastian Kasprzak, Włodzimierz Piwowarski, Paweł Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction † |
title | Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction † |
title_full | Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction † |
title_fullStr | Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction † |
title_full_unstemmed | Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction † |
title_short | Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction † |
title_sort | human interaction classification in sliding video windows using skeleton data tracking and feature extraction † |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10384121/ https://www.ncbi.nlm.nih.gov/pubmed/37514573 http://dx.doi.org/10.3390/s23146279 |
work_keys_str_mv | AT puchałasebastian humaninteractionclassificationinslidingvideowindowsusingskeletondatatrackingandfeatureextraction AT kasprzakwłodzimierz humaninteractionclassificationinslidingvideowindowsusingskeletondatatrackingandfeatureextraction AT piwowarskipaweł humaninteractionclassificationinslidingvideowindowsusingskeletondatatrackingandfeatureextraction |