Cargando…
A Deep Learning Framework for Recognizing Both Static and Dynamic Gestures
Intuitive user interfaces are indispensable to interact with the human centric smart environments. In this paper, we propose a unified framework that recognizes both static and dynamic gestures, using simple RGB vision (without depth sensing). This feature makes it suitable for inexpensive human-rob...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8004797/ https://www.ncbi.nlm.nih.gov/pubmed/33806741 http://dx.doi.org/10.3390/s21062227 |
_version_ | 1783671985347231744 |
---|---|
author | Mazhar, Osama Ramdani, Sofiane Cherubini, Andrea |
author_facet | Mazhar, Osama Ramdani, Sofiane Cherubini, Andrea |
author_sort | Mazhar, Osama |
collection | PubMed |
description | Intuitive user interfaces are indispensable to interact with the human centric smart environments. In this paper, we propose a unified framework that recognizes both static and dynamic gestures, using simple RGB vision (without depth sensing). This feature makes it suitable for inexpensive human-robot interaction in social or industrial settings. We employ a pose-driven spatial attention strategy, which guides our proposed Static and Dynamic gestures Network—StaDNet. From the image of the human upper body, we estimate his/her depth, along with the region-of-interest around his/her hands. The Convolutional Neural Network (CNN) in StaDNet is fine-tuned on a background-substituted hand gestures dataset. It is utilized to detect 10 static gestures for each hand as well as to obtain the hand image-embeddings. These are subsequently fused with the augmented pose vector and then passed to the stacked Long Short-Term Memory blocks. Thus, human-centred frame-wise information from the augmented pose vector and from the left/right hands image-embeddings are aggregated in time to predict the dynamic gestures of the performing person. In a number of experiments, we show that the proposed approach surpasses the state-of-the-art results on the large-scale Chalearn 2016 dataset. Moreover, we transfer the knowledge learned through the proposed methodology to the Praxis gestures dataset, and the obtained results also outscore the state-of-the-art on this dataset. |
format | Online Article Text |
id | pubmed-8004797 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-80047972021-03-29 A Deep Learning Framework for Recognizing Both Static and Dynamic Gestures Mazhar, Osama Ramdani, Sofiane Cherubini, Andrea Sensors (Basel) Article Intuitive user interfaces are indispensable to interact with the human centric smart environments. In this paper, we propose a unified framework that recognizes both static and dynamic gestures, using simple RGB vision (without depth sensing). This feature makes it suitable for inexpensive human-robot interaction in social or industrial settings. We employ a pose-driven spatial attention strategy, which guides our proposed Static and Dynamic gestures Network—StaDNet. From the image of the human upper body, we estimate his/her depth, along with the region-of-interest around his/her hands. The Convolutional Neural Network (CNN) in StaDNet is fine-tuned on a background-substituted hand gestures dataset. It is utilized to detect 10 static gestures for each hand as well as to obtain the hand image-embeddings. These are subsequently fused with the augmented pose vector and then passed to the stacked Long Short-Term Memory blocks. Thus, human-centred frame-wise information from the augmented pose vector and from the left/right hands image-embeddings are aggregated in time to predict the dynamic gestures of the performing person. In a number of experiments, we show that the proposed approach surpasses the state-of-the-art results on the large-scale Chalearn 2016 dataset. Moreover, we transfer the knowledge learned through the proposed methodology to the Praxis gestures dataset, and the obtained results also outscore the state-of-the-art on this dataset. MDPI 2021-03-23 /pmc/articles/PMC8004797/ /pubmed/33806741 http://dx.doi.org/10.3390/s21062227 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Mazhar, Osama Ramdani, Sofiane Cherubini, Andrea A Deep Learning Framework for Recognizing Both Static and Dynamic Gestures |
title | A Deep Learning Framework for Recognizing Both Static and Dynamic Gestures |
title_full | A Deep Learning Framework for Recognizing Both Static and Dynamic Gestures |
title_fullStr | A Deep Learning Framework for Recognizing Both Static and Dynamic Gestures |
title_full_unstemmed | A Deep Learning Framework for Recognizing Both Static and Dynamic Gestures |
title_short | A Deep Learning Framework for Recognizing Both Static and Dynamic Gestures |
title_sort | deep learning framework for recognizing both static and dynamic gestures |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8004797/ https://www.ncbi.nlm.nih.gov/pubmed/33806741 http://dx.doi.org/10.3390/s21062227 |
work_keys_str_mv | AT mazharosama adeeplearningframeworkforrecognizingbothstaticanddynamicgestures AT ramdanisofiane adeeplearningframeworkforrecognizingbothstaticanddynamicgestures AT cherubiniandrea adeeplearningframeworkforrecognizingbothstaticanddynamicgestures AT mazharosama deeplearningframeworkforrecognizingbothstaticanddynamicgestures AT ramdanisofiane deeplearningframeworkforrecognizingbothstaticanddynamicgestures AT cherubiniandrea deeplearningframeworkforrecognizingbothstaticanddynamicgestures |