Cargando…

AR3D: Attention Residual 3D Network for Human Action Recognition

At present, in the field of video-based human action recognition, deep neural networks are mainly divided into two branches: the 2D convolutional neural network (CNN) and 3D CNN. However, 2D CNN’s temporal and spatial feature extraction processes are independent of each other, which means that it is...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dong, Min, Fang, Zhenglin, Li, Yongfa, Bi, Sheng, Chen, Jiangcheng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7957788/ https://www.ncbi.nlm.nih.gov/pubmed/33670835 http://dx.doi.org/10.3390/s21051656

_version_	1783664729964675072
author	Dong, Min Fang, Zhenglin Li, Yongfa Bi, Sheng Chen, Jiangcheng
author_facet	Dong, Min Fang, Zhenglin Li, Yongfa Bi, Sheng Chen, Jiangcheng
author_sort	Dong, Min
collection	PubMed
description	At present, in the field of video-based human action recognition, deep neural networks are mainly divided into two branches: the 2D convolutional neural network (CNN) and 3D CNN. However, 2D CNN’s temporal and spatial feature extraction processes are independent of each other, which means that it is easy to ignore the internal connection, affecting the performance of recognition. Although 3D CNN can extract the temporal and spatial features of the video sequence at the same time, the parameters of the 3D model increase exponentially, resulting in the model being difficult to train and transfer. To solve this problem, this article is based on 3D CNN combined with a residual structure and attention mechanism to improve the existing 3D CNN model, and we propose two types of human action recognition models (the Residual 3D Network (R3D) and Attention Residual 3D Network (AR3D)). Firstly, in this article, we propose a shallow feature extraction module and improve the ordinary 3D residual structure, which reduces the parameters and strengthens the extraction of temporal features. Secondly, we explore the application of the attention mechanism in human action recognition and design a 3D spatio-temporal attention mechanism module to strengthen the extraction of global features of human action. Finally, in order to make full use of the residual structure and attention mechanism, an Attention Residual 3D Network (AR3D) is proposed, and its two fusion strategies and corresponding model structure (AR3D_V1, AR3D_V2) are introduced in detail. Experiments show that the fused structure shows different degrees of performance improvement compared to a single structure.
format	Online Article Text
id	pubmed-7957788
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-79577882021-03-16 AR3D: Attention Residual 3D Network for Human Action Recognition Dong, Min Fang, Zhenglin Li, Yongfa Bi, Sheng Chen, Jiangcheng Sensors (Basel) Article At present, in the field of video-based human action recognition, deep neural networks are mainly divided into two branches: the 2D convolutional neural network (CNN) and 3D CNN. However, 2D CNN’s temporal and spatial feature extraction processes are independent of each other, which means that it is easy to ignore the internal connection, affecting the performance of recognition. Although 3D CNN can extract the temporal and spatial features of the video sequence at the same time, the parameters of the 3D model increase exponentially, resulting in the model being difficult to train and transfer. To solve this problem, this article is based on 3D CNN combined with a residual structure and attention mechanism to improve the existing 3D CNN model, and we propose two types of human action recognition models (the Residual 3D Network (R3D) and Attention Residual 3D Network (AR3D)). Firstly, in this article, we propose a shallow feature extraction module and improve the ordinary 3D residual structure, which reduces the parameters and strengthens the extraction of temporal features. Secondly, we explore the application of the attention mechanism in human action recognition and design a 3D spatio-temporal attention mechanism module to strengthen the extraction of global features of human action. Finally, in order to make full use of the residual structure and attention mechanism, an Attention Residual 3D Network (AR3D) is proposed, and its two fusion strategies and corresponding model structure (AR3D_V1, AR3D_V2) are introduced in detail. Experiments show that the fused structure shows different degrees of performance improvement compared to a single structure. MDPI 2021-02-28 /pmc/articles/PMC7957788/ /pubmed/33670835 http://dx.doi.org/10.3390/s21051656 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Dong, Min Fang, Zhenglin Li, Yongfa Bi, Sheng Chen, Jiangcheng AR3D: Attention Residual 3D Network for Human Action Recognition
title	AR3D: Attention Residual 3D Network for Human Action Recognition
title_full	AR3D: Attention Residual 3D Network for Human Action Recognition
title_fullStr	AR3D: Attention Residual 3D Network for Human Action Recognition
title_full_unstemmed	AR3D: Attention Residual 3D Network for Human Action Recognition
title_short	AR3D: Attention Residual 3D Network for Human Action Recognition
title_sort	ar3d: attention residual 3d network for human action recognition
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7957788/ https://www.ncbi.nlm.nih.gov/pubmed/33670835 http://dx.doi.org/10.3390/s21051656
work_keys_str_mv	AT dongmin ar3dattentionresidual3dnetworkforhumanactionrecognition AT fangzhenglin ar3dattentionresidual3dnetworkforhumanactionrecognition AT liyongfa ar3dattentionresidual3dnetworkforhumanactionrecognition AT bisheng ar3dattentionresidual3dnetworkforhumanactionrecognition AT chenjiangcheng ar3dattentionresidual3dnetworkforhumanactionrecognition

AR3D: Attention Residual 3D Network for Human Action Recognition

Ejemplares similares