Cargando…
Video-Based Human Activity Recognition Using Deep Learning Approaches
Due to its capacity to gather vast, high-level data about human activity from wearable or stationary sensors, human activity recognition substantially impacts people’s day-to-day lives. Multiple people and things may be seen acting in the video, dispersed throughout the frame in various places. Beca...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10386633/ https://www.ncbi.nlm.nih.gov/pubmed/37514677 http://dx.doi.org/10.3390/s23146384 |
_version_ | 1785081715949043712 |
---|---|
author | Surek, Guilherme Augusto Silva Seman, Laio Oriel Stefenon, Stefano Frizzo Mariani, Viviana Cocco Coelho, Leandro dos Santos |
author_facet | Surek, Guilherme Augusto Silva Seman, Laio Oriel Stefenon, Stefano Frizzo Mariani, Viviana Cocco Coelho, Leandro dos Santos |
author_sort | Surek, Guilherme Augusto Silva |
collection | PubMed |
description | Due to its capacity to gather vast, high-level data about human activity from wearable or stationary sensors, human activity recognition substantially impacts people’s day-to-day lives. Multiple people and things may be seen acting in the video, dispersed throughout the frame in various places. Because of this, modeling the interactions between many entities in spatial dimensions is necessary for visual reasoning in the action recognition task. The main aim of this paper is to evaluate and map the current scenario of human actions in red, green, and blue videos, based on deep learning models. A residual network (ResNet) and a vision transformer architecture (ViT) with a semi-supervised learning approach are evaluated. The DINO (self-DIstillation with NO labels) is used to enhance the potential of the ResNet and ViT. The evaluated benchmark is the human motion database (HMDB51), which tries to better capture the richness and complexity of human actions. The obtained results for video classification with the proposed ViT are promising based on performance metrics and results from the recent literature. The results obtained using a bi-dimensional ViT with long short-term memory demonstrated great performance in human action recognition when applied to the HMDB51 dataset. The mentioned architecture presented 96.7 ± 0.35% and 41.0 ± 0.27% in terms of accuracy (mean ± standard deviation values) in the train and test phases of the HMDB51 dataset, respectively. |
format | Online Article Text |
id | pubmed-10386633 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-103866332023-07-30 Video-Based Human Activity Recognition Using Deep Learning Approaches Surek, Guilherme Augusto Silva Seman, Laio Oriel Stefenon, Stefano Frizzo Mariani, Viviana Cocco Coelho, Leandro dos Santos Sensors (Basel) Article Due to its capacity to gather vast, high-level data about human activity from wearable or stationary sensors, human activity recognition substantially impacts people’s day-to-day lives. Multiple people and things may be seen acting in the video, dispersed throughout the frame in various places. Because of this, modeling the interactions between many entities in spatial dimensions is necessary for visual reasoning in the action recognition task. The main aim of this paper is to evaluate and map the current scenario of human actions in red, green, and blue videos, based on deep learning models. A residual network (ResNet) and a vision transformer architecture (ViT) with a semi-supervised learning approach are evaluated. The DINO (self-DIstillation with NO labels) is used to enhance the potential of the ResNet and ViT. The evaluated benchmark is the human motion database (HMDB51), which tries to better capture the richness and complexity of human actions. The obtained results for video classification with the proposed ViT are promising based on performance metrics and results from the recent literature. The results obtained using a bi-dimensional ViT with long short-term memory demonstrated great performance in human action recognition when applied to the HMDB51 dataset. The mentioned architecture presented 96.7 ± 0.35% and 41.0 ± 0.27% in terms of accuracy (mean ± standard deviation values) in the train and test phases of the HMDB51 dataset, respectively. MDPI 2023-07-13 /pmc/articles/PMC10386633/ /pubmed/37514677 http://dx.doi.org/10.3390/s23146384 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Surek, Guilherme Augusto Silva Seman, Laio Oriel Stefenon, Stefano Frizzo Mariani, Viviana Cocco Coelho, Leandro dos Santos Video-Based Human Activity Recognition Using Deep Learning Approaches |
title | Video-Based Human Activity Recognition Using Deep Learning Approaches |
title_full | Video-Based Human Activity Recognition Using Deep Learning Approaches |
title_fullStr | Video-Based Human Activity Recognition Using Deep Learning Approaches |
title_full_unstemmed | Video-Based Human Activity Recognition Using Deep Learning Approaches |
title_short | Video-Based Human Activity Recognition Using Deep Learning Approaches |
title_sort | video-based human activity recognition using deep learning approaches |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10386633/ https://www.ncbi.nlm.nih.gov/pubmed/37514677 http://dx.doi.org/10.3390/s23146384 |
work_keys_str_mv | AT surekguilhermeaugustosilva videobasedhumanactivityrecognitionusingdeeplearningapproaches AT semanlaiooriel videobasedhumanactivityrecognitionusingdeeplearningapproaches AT stefenonstefanofrizzo videobasedhumanactivityrecognitionusingdeeplearningapproaches AT marianivivianacocco videobasedhumanactivityrecognitionusingdeeplearningapproaches AT coelholeandrodossantos videobasedhumanactivityrecognitionusingdeeplearningapproaches |