Cargando…

Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer

Automatic hand gesture recognition in video sequences has widespread applications, ranging from home automation to sign language interpretation and clinical operations. The primary challenge lies in achieving real-time recognition while managing temporal dependencies that can impact performance. Exi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhong, Enmin, del-Blanco, Carlos R., Berjón, Daniel, Jaureguizar, Fernando, García, Narciso
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10459010/ https://www.ncbi.nlm.nih.gov/pubmed/37631602 http://dx.doi.org/10.3390/s23167066

_version_	1785097304871534592
author	Zhong, Enmin del-Blanco, Carlos R. Berjón, Daniel Jaureguizar, Fernando García, Narciso
author_facet	Zhong, Enmin del-Blanco, Carlos R. Berjón, Daniel Jaureguizar, Fernando García, Narciso
author_sort	Zhong, Enmin
collection	PubMed
description	Automatic hand gesture recognition in video sequences has widespread applications, ranging from home automation to sign language interpretation and clinical operations. The primary challenge lies in achieving real-time recognition while managing temporal dependencies that can impact performance. Existing methods employ 3D convolutional or Transformer-based architectures with hand skeleton estimation, but both have limitations. To address these challenges, a hybrid approach that combines 3D Convolutional Neural Networks (3D-CNNs) and Transformers is proposed. The method involves using a 3D-CNN to compute high-level semantic skeleton embeddings, capturing local spatial and temporal characteristics of hand gestures. A Transformer network with a self-attention mechanism is then employed to efficiently capture long-range temporal dependencies in the skeleton sequence. Evaluation of the Briareo and Multimodal Hand Gesture datasets resulted in accuracy scores of 95.49% and 97.25%, respectively. Notably, this approach achieves real-time performance using a standard CPU, distinguishing it from methods that require specialized GPUs. The hybrid approach’s real-time efficiency and high accuracy demonstrate its superiority over existing state-of-the-art methods. In summary, the hybrid 3D-CNN and Transformer approach effectively addresses real-time recognition challenges and efficient handling of temporal dependencies, outperforming existing methods in both accuracy and speed.
format	Online Article Text
id	pubmed-10459010
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-104590102023-08-27 Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer Zhong, Enmin del-Blanco, Carlos R. Berjón, Daniel Jaureguizar, Fernando García, Narciso Sensors (Basel) Article Automatic hand gesture recognition in video sequences has widespread applications, ranging from home automation to sign language interpretation and clinical operations. The primary challenge lies in achieving real-time recognition while managing temporal dependencies that can impact performance. Existing methods employ 3D convolutional or Transformer-based architectures with hand skeleton estimation, but both have limitations. To address these challenges, a hybrid approach that combines 3D Convolutional Neural Networks (3D-CNNs) and Transformers is proposed. The method involves using a 3D-CNN to compute high-level semantic skeleton embeddings, capturing local spatial and temporal characteristics of hand gestures. A Transformer network with a self-attention mechanism is then employed to efficiently capture long-range temporal dependencies in the skeleton sequence. Evaluation of the Briareo and Multimodal Hand Gesture datasets resulted in accuracy scores of 95.49% and 97.25%, respectively. Notably, this approach achieves real-time performance using a standard CPU, distinguishing it from methods that require specialized GPUs. The hybrid approach’s real-time efficiency and high accuracy demonstrate its superiority over existing state-of-the-art methods. In summary, the hybrid 3D-CNN and Transformer approach effectively addresses real-time recognition challenges and efficient handling of temporal dependencies, outperforming existing methods in both accuracy and speed. MDPI 2023-08-10 /pmc/articles/PMC10459010/ /pubmed/37631602 http://dx.doi.org/10.3390/s23167066 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Zhong, Enmin del-Blanco, Carlos R. Berjón, Daniel Jaureguizar, Fernando García, Narciso Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer
title	Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer
title_full	Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer
title_fullStr	Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer
title_full_unstemmed	Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer
title_short	Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer
title_sort	real-time monocular skeleton-based hand gesture recognition using 3d-jointsformer
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10459010/ https://www.ncbi.nlm.nih.gov/pubmed/37631602 http://dx.doi.org/10.3390/s23167066
work_keys_str_mv	AT zhongenmin realtimemonocularskeletonbasedhandgesturerecognitionusing3djointsformer AT delblancocarlosr realtimemonocularskeletonbasedhandgesturerecognitionusing3djointsformer AT berjondaniel realtimemonocularskeletonbasedhandgesturerecognitionusing3djointsformer AT jaureguizarfernando realtimemonocularskeletonbasedhandgesturerecognitionusing3djointsformer AT garcianarciso realtimemonocularskeletonbasedhandgesturerecognitionusing3djointsformer

Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer

Ejemplares similares