Cargando…

Multi-scale and attention enhanced graph convolution network for skeleton-based violence action recognition

Graph convolution networks (GCNs) have been widely used in the field of skeleton-based human action recognition. However, it is still difficult to improve recognition performance and reduce parameter complexity. In this paper, a novel multi-scale attention spatiotemporal GCN (MSA-STGCN) is proposed...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yang, Huaigang, Ren, Ziliang, Yuan, Huaqiang, Wei, Wenhong, Zhang, Qieshi, Zhang, Zhaolong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9797844/ https://www.ncbi.nlm.nih.gov/pubmed/36590083 http://dx.doi.org/10.3389/fnbot.2022.1091361

_version_	1784860772351868928
author	Yang, Huaigang Ren, Ziliang Yuan, Huaqiang Wei, Wenhong Zhang, Qieshi Zhang, Zhaolong
author_facet	Yang, Huaigang Ren, Ziliang Yuan, Huaqiang Wei, Wenhong Zhang, Qieshi Zhang, Zhaolong
author_sort	Yang, Huaigang
collection	PubMed
description	Graph convolution networks (GCNs) have been widely used in the field of skeleton-based human action recognition. However, it is still difficult to improve recognition performance and reduce parameter complexity. In this paper, a novel multi-scale attention spatiotemporal GCN (MSA-STGCN) is proposed for human violence action recognition by learning spatiotemporal features from four different skeleton modality variants. Firstly, the original joint data are preprocessed to obtain joint position, bone vector, joint motion and bone motion datas as inputs of recognition framework. Then, a spatial multi-scale graph convolution network based on the attention mechanism is constructed to obtain the spatial features from joint nodes, while a temporal graph convolution network in the form of hybrid dilation convolution is designed to enlarge the receptive field of the feature map and capture multi-scale context information. Finally, the specific relationship in the different skeleton data is explored by fusing the information of multi-stream related to human joints and bones. To evaluate the performance of the proposed MSA-STGCN, a skeleton violence action dataset: Filtered NTU RGB+D was constructed based on NTU RGB+D120. We conducted experiments on constructed Filtered NTU RGB+D and Kinetics Skeleton 400 datasets to verify the performance of the proposed recognition framework. The proposed method achieves an accuracy of 95.3% on the Filtered NTU RGB+D with the parameters 1.21M, and an accuracy of 36.2% (Top-1) and 58.5% (Top-5) on the Kinetics Skeleton 400, respectively. The experimental results on these two skeleton datasets show that the proposed recognition framework can effectively recognize violence actions without adding parameters.
format	Online Article Text
id	pubmed-9797844
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-97978442022-12-30 Multi-scale and attention enhanced graph convolution network for skeleton-based violence action recognition Yang, Huaigang Ren, Ziliang Yuan, Huaqiang Wei, Wenhong Zhang, Qieshi Zhang, Zhaolong Front Neurorobot Neuroscience Graph convolution networks (GCNs) have been widely used in the field of skeleton-based human action recognition. However, it is still difficult to improve recognition performance and reduce parameter complexity. In this paper, a novel multi-scale attention spatiotemporal GCN (MSA-STGCN) is proposed for human violence action recognition by learning spatiotemporal features from four different skeleton modality variants. Firstly, the original joint data are preprocessed to obtain joint position, bone vector, joint motion and bone motion datas as inputs of recognition framework. Then, a spatial multi-scale graph convolution network based on the attention mechanism is constructed to obtain the spatial features from joint nodes, while a temporal graph convolution network in the form of hybrid dilation convolution is designed to enlarge the receptive field of the feature map and capture multi-scale context information. Finally, the specific relationship in the different skeleton data is explored by fusing the information of multi-stream related to human joints and bones. To evaluate the performance of the proposed MSA-STGCN, a skeleton violence action dataset: Filtered NTU RGB+D was constructed based on NTU RGB+D120. We conducted experiments on constructed Filtered NTU RGB+D and Kinetics Skeleton 400 datasets to verify the performance of the proposed recognition framework. The proposed method achieves an accuracy of 95.3% on the Filtered NTU RGB+D with the parameters 1.21M, and an accuracy of 36.2% (Top-1) and 58.5% (Top-5) on the Kinetics Skeleton 400, respectively. The experimental results on these two skeleton datasets show that the proposed recognition framework can effectively recognize violence actions without adding parameters. Frontiers Media S.A. 2022-12-15 /pmc/articles/PMC9797844/ /pubmed/36590083 http://dx.doi.org/10.3389/fnbot.2022.1091361 Text en Copyright © 2022 Yang, Ren, Yuan, Wei, Zhang and Zhang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Yang, Huaigang Ren, Ziliang Yuan, Huaqiang Wei, Wenhong Zhang, Qieshi Zhang, Zhaolong Multi-scale and attention enhanced graph convolution network for skeleton-based violence action recognition
title	Multi-scale and attention enhanced graph convolution network for skeleton-based violence action recognition
title_full	Multi-scale and attention enhanced graph convolution network for skeleton-based violence action recognition
title_fullStr	Multi-scale and attention enhanced graph convolution network for skeleton-based violence action recognition
title_full_unstemmed	Multi-scale and attention enhanced graph convolution network for skeleton-based violence action recognition
title_short	Multi-scale and attention enhanced graph convolution network for skeleton-based violence action recognition
title_sort	multi-scale and attention enhanced graph convolution network for skeleton-based violence action recognition
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9797844/ https://www.ncbi.nlm.nih.gov/pubmed/36590083 http://dx.doi.org/10.3389/fnbot.2022.1091361
work_keys_str_mv	AT yanghuaigang multiscaleandattentionenhancedgraphconvolutionnetworkforskeletonbasedviolenceactionrecognition AT renziliang multiscaleandattentionenhancedgraphconvolutionnetworkforskeletonbasedviolenceactionrecognition AT yuanhuaqiang multiscaleandattentionenhancedgraphconvolutionnetworkforskeletonbasedviolenceactionrecognition AT weiwenhong multiscaleandattentionenhancedgraphconvolutionnetworkforskeletonbasedviolenceactionrecognition AT zhangqieshi multiscaleandattentionenhancedgraphconvolutionnetworkforskeletonbasedviolenceactionrecognition AT zhangzhaolong multiscaleandattentionenhancedgraphconvolutionnetworkforskeletonbasedviolenceactionrecognition

Multi-scale and attention enhanced graph convolution network for skeleton-based violence action recognition

Ejemplares similares