Cargando…

Multi-Scale Attention 3D Convolutional Network for Multimodal Gesture Recognition

Gesture recognition is an important direction in computer vision research. Information from the hands is crucial in this task. However, current methods consistently achieve attention on hand regions based on estimated keypoints, which will significantly increase both time and complexity, and may los...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Huizhou, Li, Yunan, Fang, Huijuan, Xin, Wentian, Lu, Zixiang, Miao, Qiguang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8950910/ https://www.ncbi.nlm.nih.gov/pubmed/35336576 http://dx.doi.org/10.3390/s22062405

_version_	1784675256614518784
author	Chen, Huizhou Li, Yunan Fang, Huijuan Xin, Wentian Lu, Zixiang Miao, Qiguang
author_facet	Chen, Huizhou Li, Yunan Fang, Huijuan Xin, Wentian Lu, Zixiang Miao, Qiguang
author_sort	Chen, Huizhou
collection	PubMed
description	Gesture recognition is an important direction in computer vision research. Information from the hands is crucial in this task. However, current methods consistently achieve attention on hand regions based on estimated keypoints, which will significantly increase both time and complexity, and may lose position information of the hand due to wrong keypoint estimations. Moreover, for dynamic gesture recognition, it is not enough to consider only the attention in the spatial dimension. This paper proposes a multi-scale attention 3D convolutional network for gesture recognition, with a fusion of multimodal data. The proposed network achieves attention mechanisms both locally and globally. The local attention leverages the hand information extracted by the hand detector to focus on the hand region, and reduces the interference of gesture-irrelevant factors. Global attention is achieved in both the human-posture context and the channel context through a dual spatiotemporal attention module. Furthermore, to make full use of the differences between different modalities of data, we designed a multimodal fusion scheme to fuse the features of RGB and depth data. The proposed method is evaluated using the Chalearn LAP Isolated Gesture Dataset and the Briareo Dataset. Experiments on these two datasets prove the effectiveness of our network and show it outperforms many state-of-the-art methods.
format	Online Article Text
id	pubmed-8950910
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-89509102022-03-26 Multi-Scale Attention 3D Convolutional Network for Multimodal Gesture Recognition Chen, Huizhou Li, Yunan Fang, Huijuan Xin, Wentian Lu, Zixiang Miao, Qiguang Sensors (Basel) Article Gesture recognition is an important direction in computer vision research. Information from the hands is crucial in this task. However, current methods consistently achieve attention on hand regions based on estimated keypoints, which will significantly increase both time and complexity, and may lose position information of the hand due to wrong keypoint estimations. Moreover, for dynamic gesture recognition, it is not enough to consider only the attention in the spatial dimension. This paper proposes a multi-scale attention 3D convolutional network for gesture recognition, with a fusion of multimodal data. The proposed network achieves attention mechanisms both locally and globally. The local attention leverages the hand information extracted by the hand detector to focus on the hand region, and reduces the interference of gesture-irrelevant factors. Global attention is achieved in both the human-posture context and the channel context through a dual spatiotemporal attention module. Furthermore, to make full use of the differences between different modalities of data, we designed a multimodal fusion scheme to fuse the features of RGB and depth data. The proposed method is evaluated using the Chalearn LAP Isolated Gesture Dataset and the Briareo Dataset. Experiments on these two datasets prove the effectiveness of our network and show it outperforms many state-of-the-art methods. MDPI 2022-03-21 /pmc/articles/PMC8950910/ /pubmed/35336576 http://dx.doi.org/10.3390/s22062405 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Chen, Huizhou Li, Yunan Fang, Huijuan Xin, Wentian Lu, Zixiang Miao, Qiguang Multi-Scale Attention 3D Convolutional Network for Multimodal Gesture Recognition
title	Multi-Scale Attention 3D Convolutional Network for Multimodal Gesture Recognition
title_full	Multi-Scale Attention 3D Convolutional Network for Multimodal Gesture Recognition
title_fullStr	Multi-Scale Attention 3D Convolutional Network for Multimodal Gesture Recognition
title_full_unstemmed	Multi-Scale Attention 3D Convolutional Network for Multimodal Gesture Recognition
title_short	Multi-Scale Attention 3D Convolutional Network for Multimodal Gesture Recognition
title_sort	multi-scale attention 3d convolutional network for multimodal gesture recognition
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8950910/ https://www.ncbi.nlm.nih.gov/pubmed/35336576 http://dx.doi.org/10.3390/s22062405
work_keys_str_mv	AT chenhuizhou multiscaleattention3dconvolutionalnetworkformultimodalgesturerecognition AT liyunan multiscaleattention3dconvolutionalnetworkformultimodalgesturerecognition AT fanghuijuan multiscaleattention3dconvolutionalnetworkformultimodalgesturerecognition AT xinwentian multiscaleattention3dconvolutionalnetworkformultimodalgesturerecognition AT luzixiang multiscaleattention3dconvolutionalnetworkformultimodalgesturerecognition AT miaoqiguang multiscaleattention3dconvolutionalnetworkformultimodalgesturerecognition

Multi-Scale Attention 3D Convolutional Network for Multimodal Gesture Recognition

Ejemplares similares