Cargando…

Video-Based Human Activity Recognition Using Deep Learning Approaches

Due to its capacity to gather vast, high-level data about human activity from wearable or stationary sensors, human activity recognition substantially impacts people’s day-to-day lives. Multiple people and things may be seen acting in the video, dispersed throughout the frame in various places. Beca...

Descripción completa

Detalles Bibliográficos
Autores principales: Surek, Guilherme Augusto Silva, Seman, Laio Oriel, Stefenon, Stefano Frizzo, Mariani, Viviana Cocco, Coelho, Leandro dos Santos
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10386633/
https://www.ncbi.nlm.nih.gov/pubmed/37514677
http://dx.doi.org/10.3390/s23146384
_version_ 1785081715949043712
author Surek, Guilherme Augusto Silva
Seman, Laio Oriel
Stefenon, Stefano Frizzo
Mariani, Viviana Cocco
Coelho, Leandro dos Santos
author_facet Surek, Guilherme Augusto Silva
Seman, Laio Oriel
Stefenon, Stefano Frizzo
Mariani, Viviana Cocco
Coelho, Leandro dos Santos
author_sort Surek, Guilherme Augusto Silva
collection PubMed
description Due to its capacity to gather vast, high-level data about human activity from wearable or stationary sensors, human activity recognition substantially impacts people’s day-to-day lives. Multiple people and things may be seen acting in the video, dispersed throughout the frame in various places. Because of this, modeling the interactions between many entities in spatial dimensions is necessary for visual reasoning in the action recognition task. The main aim of this paper is to evaluate and map the current scenario of human actions in red, green, and blue videos, based on deep learning models. A residual network (ResNet) and a vision transformer architecture (ViT) with a semi-supervised learning approach are evaluated. The DINO (self-DIstillation with NO labels) is used to enhance the potential of the ResNet and ViT. The evaluated benchmark is the human motion database (HMDB51), which tries to better capture the richness and complexity of human actions. The obtained results for video classification with the proposed ViT are promising based on performance metrics and results from the recent literature. The results obtained using a bi-dimensional ViT with long short-term memory demonstrated great performance in human action recognition when applied to the HMDB51 dataset. The mentioned architecture presented 96.7 ± 0.35% and 41.0 ± 0.27% in terms of accuracy (mean ± standard deviation values) in the train and test phases of the HMDB51 dataset, respectively.
format Online
Article
Text
id pubmed-10386633
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-103866332023-07-30 Video-Based Human Activity Recognition Using Deep Learning Approaches Surek, Guilherme Augusto Silva Seman, Laio Oriel Stefenon, Stefano Frizzo Mariani, Viviana Cocco Coelho, Leandro dos Santos Sensors (Basel) Article Due to its capacity to gather vast, high-level data about human activity from wearable or stationary sensors, human activity recognition substantially impacts people’s day-to-day lives. Multiple people and things may be seen acting in the video, dispersed throughout the frame in various places. Because of this, modeling the interactions between many entities in spatial dimensions is necessary for visual reasoning in the action recognition task. The main aim of this paper is to evaluate and map the current scenario of human actions in red, green, and blue videos, based on deep learning models. A residual network (ResNet) and a vision transformer architecture (ViT) with a semi-supervised learning approach are evaluated. The DINO (self-DIstillation with NO labels) is used to enhance the potential of the ResNet and ViT. The evaluated benchmark is the human motion database (HMDB51), which tries to better capture the richness and complexity of human actions. The obtained results for video classification with the proposed ViT are promising based on performance metrics and results from the recent literature. The results obtained using a bi-dimensional ViT with long short-term memory demonstrated great performance in human action recognition when applied to the HMDB51 dataset. The mentioned architecture presented 96.7 ± 0.35% and 41.0 ± 0.27% in terms of accuracy (mean ± standard deviation values) in the train and test phases of the HMDB51 dataset, respectively. MDPI 2023-07-13 /pmc/articles/PMC10386633/ /pubmed/37514677 http://dx.doi.org/10.3390/s23146384 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Surek, Guilherme Augusto Silva
Seman, Laio Oriel
Stefenon, Stefano Frizzo
Mariani, Viviana Cocco
Coelho, Leandro dos Santos
Video-Based Human Activity Recognition Using Deep Learning Approaches
title Video-Based Human Activity Recognition Using Deep Learning Approaches
title_full Video-Based Human Activity Recognition Using Deep Learning Approaches
title_fullStr Video-Based Human Activity Recognition Using Deep Learning Approaches
title_full_unstemmed Video-Based Human Activity Recognition Using Deep Learning Approaches
title_short Video-Based Human Activity Recognition Using Deep Learning Approaches
title_sort video-based human activity recognition using deep learning approaches
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10386633/
https://www.ncbi.nlm.nih.gov/pubmed/37514677
http://dx.doi.org/10.3390/s23146384
work_keys_str_mv AT surekguilhermeaugustosilva videobasedhumanactivityrecognitionusingdeeplearningapproaches
AT semanlaiooriel videobasedhumanactivityrecognitionusingdeeplearningapproaches
AT stefenonstefanofrizzo videobasedhumanactivityrecognitionusingdeeplearningapproaches
AT marianivivianacocco videobasedhumanactivityrecognitionusingdeeplearningapproaches
AT coelholeandrodossantos videobasedhumanactivityrecognitionusingdeeplearningapproaches