Cargando…

A Spatio-Temporal Motion Network for Action Recognition Based on Spatial Attention

Temporal modeling is the key for action recognition in videos, but traditional 2D CNNs do not capture temporal relationships well. 3D CNNs can achieve good performance, but are computationally intensive and not well practiced on existing devices. Based on these problems, we design a generic and effe...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yang, Qi, Lu, Tongwei, Zhou, Huabing
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8947561/ https://www.ncbi.nlm.nih.gov/pubmed/35327879 http://dx.doi.org/10.3390/e24030368

_version_	1784674469133942784
author	Yang, Qi Lu, Tongwei Zhou, Huabing
author_facet	Yang, Qi Lu, Tongwei Zhou, Huabing
author_sort	Yang, Qi
collection	PubMed
description	Temporal modeling is the key for action recognition in videos, but traditional 2D CNNs do not capture temporal relationships well. 3D CNNs can achieve good performance, but are computationally intensive and not well practiced on existing devices. Based on these problems, we design a generic and effective module called spatio-temporal motion network (SMNet). SMNet maintains the complexity of 2D and reduces the computational effort of the algorithm while achieving performance comparable to 3D CNNs. SMNet contains a spatio-temporal excitation module (SE) and a motion excitation module (ME). The SE module uses group convolution to fuse temporal information to reduce the number of parameters in the network, and uses spatial attention to extract spatial information. The ME module uses the difference between adjacent frames to extract feature-level motion patterns between adjacent frames, which can effectively encode motion features and help identify actions efficiently. We use ResNet-50 as the backbone network and insert SMNet into the residual blocks to form a simple and effective action network. The experiment results on three datasets, namely Something-Something V1, Something-Something V2, and Kinetics-400, show that it out performs state-of-the-arts motion recognition networks.
format	Online Article Text
id	pubmed-8947561
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-89475612022-03-25 A Spatio-Temporal Motion Network for Action Recognition Based on Spatial Attention Yang, Qi Lu, Tongwei Zhou, Huabing Entropy (Basel) Article Temporal modeling is the key for action recognition in videos, but traditional 2D CNNs do not capture temporal relationships well. 3D CNNs can achieve good performance, but are computationally intensive and not well practiced on existing devices. Based on these problems, we design a generic and effective module called spatio-temporal motion network (SMNet). SMNet maintains the complexity of 2D and reduces the computational effort of the algorithm while achieving performance comparable to 3D CNNs. SMNet contains a spatio-temporal excitation module (SE) and a motion excitation module (ME). The SE module uses group convolution to fuse temporal information to reduce the number of parameters in the network, and uses spatial attention to extract spatial information. The ME module uses the difference between adjacent frames to extract feature-level motion patterns between adjacent frames, which can effectively encode motion features and help identify actions efficiently. We use ResNet-50 as the backbone network and insert SMNet into the residual blocks to form a simple and effective action network. The experiment results on three datasets, namely Something-Something V1, Something-Something V2, and Kinetics-400, show that it out performs state-of-the-arts motion recognition networks. MDPI 2022-03-04 /pmc/articles/PMC8947561/ /pubmed/35327879 http://dx.doi.org/10.3390/e24030368 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Yang, Qi Lu, Tongwei Zhou, Huabing A Spatio-Temporal Motion Network for Action Recognition Based on Spatial Attention
title	A Spatio-Temporal Motion Network for Action Recognition Based on Spatial Attention
title_full	A Spatio-Temporal Motion Network for Action Recognition Based on Spatial Attention
title_fullStr	A Spatio-Temporal Motion Network for Action Recognition Based on Spatial Attention
title_full_unstemmed	A Spatio-Temporal Motion Network for Action Recognition Based on Spatial Attention
title_short	A Spatio-Temporal Motion Network for Action Recognition Based on Spatial Attention
title_sort	spatio-temporal motion network for action recognition based on spatial attention
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8947561/ https://www.ncbi.nlm.nih.gov/pubmed/35327879 http://dx.doi.org/10.3390/e24030368
work_keys_str_mv	AT yangqi aspatiotemporalmotionnetworkforactionrecognitionbasedonspatialattention AT lutongwei aspatiotemporalmotionnetworkforactionrecognitionbasedonspatialattention AT zhouhuabing aspatiotemporalmotionnetworkforactionrecognitionbasedonspatialattention AT yangqi spatiotemporalmotionnetworkforactionrecognitionbasedonspatialattention AT lutongwei spatiotemporalmotionnetworkforactionrecognitionbasedonspatialattention AT zhouhuabing spatiotemporalmotionnetworkforactionrecognitionbasedonspatialattention

A Spatio-Temporal Motion Network for Action Recognition Based on Spatial Attention

Ejemplares similares