Cargando…

Multi-Modality Adaptive Feature Fusion Graph Convolutional Network for Skeleton-Based Action Recognition

Graph convolutional networks are widely used in skeleton-based action recognition because of their good fitting ability to non-Euclidean data. While conventional multi-scale temporal convolution uses several fixed-size convolution kernels or dilation rates at each layer of the network, we argue that...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Haiping, Zhang, Xinhao, Yu, Dongjin, Guan, Liming, Wang, Dongjing, Zhou, Fuxing, Zhang, Wanjun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10303820/ https://www.ncbi.nlm.nih.gov/pubmed/37420580 http://dx.doi.org/10.3390/s23125414

_version_	1785065365510815744
author	Zhang, Haiping Zhang, Xinhao Yu, Dongjin Guan, Liming Wang, Dongjing Zhou, Fuxing Zhang, Wanjun
author_facet	Zhang, Haiping Zhang, Xinhao Yu, Dongjin Guan, Liming Wang, Dongjing Zhou, Fuxing Zhang, Wanjun
author_sort	Zhang, Haiping
collection	PubMed
description	Graph convolutional networks are widely used in skeleton-based action recognition because of their good fitting ability to non-Euclidean data. While conventional multi-scale temporal convolution uses several fixed-size convolution kernels or dilation rates at each layer of the network, we argue that different layers and datasets require different receptive fields. We use multi-scale adaptive convolution kernels and dilation rates to optimize traditional multi-scale temporal convolution with a simple and effective self attention mechanism, allowing different network layers to adaptively select convolution kernels of different sizes and dilation rates instead of being fixed and unchanged. Besides, the effective receptive field of the simple residual connection is not large, and there is a great deal of redundancy in the deep residual network, which will lead to the loss of context when aggregating spatio-temporal information. This article introduces a feature fusion mechanism that replaces the residual connection between initial features and temporal module outputs, effectively solving the problems of context aggregation and initial feature fusion. We propose a multi-modality adaptive feature fusion framework (MMAFF) to simultaneously increase the receptive field in both spatial and temporal dimensions. Concretely, we input the features extracted by the spatial module into the adaptive temporal fusion module to simultaneously extract multi-scale skeleton features in both spatial and temporal parts. In addition, based on the current multi-stream approach, we use the limb stream to uniformly process correlated data from multiple modalities. Extensive experiments show that our model obtains competitive results with state-of-the-art methods on the NTU-RGB+D 60 and NTU-RGB+D 120 datasets.
format	Online Article Text
id	pubmed-10303820
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-103038202023-06-29 Multi-Modality Adaptive Feature Fusion Graph Convolutional Network for Skeleton-Based Action Recognition Zhang, Haiping Zhang, Xinhao Yu, Dongjin Guan, Liming Wang, Dongjing Zhou, Fuxing Zhang, Wanjun Sensors (Basel) Article Graph convolutional networks are widely used in skeleton-based action recognition because of their good fitting ability to non-Euclidean data. While conventional multi-scale temporal convolution uses several fixed-size convolution kernels or dilation rates at each layer of the network, we argue that different layers and datasets require different receptive fields. We use multi-scale adaptive convolution kernels and dilation rates to optimize traditional multi-scale temporal convolution with a simple and effective self attention mechanism, allowing different network layers to adaptively select convolution kernels of different sizes and dilation rates instead of being fixed and unchanged. Besides, the effective receptive field of the simple residual connection is not large, and there is a great deal of redundancy in the deep residual network, which will lead to the loss of context when aggregating spatio-temporal information. This article introduces a feature fusion mechanism that replaces the residual connection between initial features and temporal module outputs, effectively solving the problems of context aggregation and initial feature fusion. We propose a multi-modality adaptive feature fusion framework (MMAFF) to simultaneously increase the receptive field in both spatial and temporal dimensions. Concretely, we input the features extracted by the spatial module into the adaptive temporal fusion module to simultaneously extract multi-scale skeleton features in both spatial and temporal parts. In addition, based on the current multi-stream approach, we use the limb stream to uniformly process correlated data from multiple modalities. Extensive experiments show that our model obtains competitive results with state-of-the-art methods on the NTU-RGB+D 60 and NTU-RGB+D 120 datasets. MDPI 2023-06-07 /pmc/articles/PMC10303820/ /pubmed/37420580 http://dx.doi.org/10.3390/s23125414 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Zhang, Haiping Zhang, Xinhao Yu, Dongjin Guan, Liming Wang, Dongjing Zhou, Fuxing Zhang, Wanjun Multi-Modality Adaptive Feature Fusion Graph Convolutional Network for Skeleton-Based Action Recognition
title	Multi-Modality Adaptive Feature Fusion Graph Convolutional Network for Skeleton-Based Action Recognition
title_full	Multi-Modality Adaptive Feature Fusion Graph Convolutional Network for Skeleton-Based Action Recognition
title_fullStr	Multi-Modality Adaptive Feature Fusion Graph Convolutional Network for Skeleton-Based Action Recognition
title_full_unstemmed	Multi-Modality Adaptive Feature Fusion Graph Convolutional Network for Skeleton-Based Action Recognition
title_short	Multi-Modality Adaptive Feature Fusion Graph Convolutional Network for Skeleton-Based Action Recognition
title_sort	multi-modality adaptive feature fusion graph convolutional network for skeleton-based action recognition
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10303820/ https://www.ncbi.nlm.nih.gov/pubmed/37420580 http://dx.doi.org/10.3390/s23125414
work_keys_str_mv	AT zhanghaiping multimodalityadaptivefeaturefusiongraphconvolutionalnetworkforskeletonbasedactionrecognition AT zhangxinhao multimodalityadaptivefeaturefusiongraphconvolutionalnetworkforskeletonbasedactionrecognition AT yudongjin multimodalityadaptivefeaturefusiongraphconvolutionalnetworkforskeletonbasedactionrecognition AT guanliming multimodalityadaptivefeaturefusiongraphconvolutionalnetworkforskeletonbasedactionrecognition AT wangdongjing multimodalityadaptivefeaturefusiongraphconvolutionalnetworkforskeletonbasedactionrecognition AT zhoufuxing multimodalityadaptivefeaturefusiongraphconvolutionalnetworkforskeletonbasedactionrecognition AT zhangwanjun multimodalityadaptivefeaturefusiongraphconvolutionalnetworkforskeletonbasedactionrecognition

Multi-Modality Adaptive Feature Fusion Graph Convolutional Network for Skeleton-Based Action Recognition

Ejemplares similares