Cargando…

Human action recognition method based on Motion Excitation and Temporal Aggregation module

Aiming at the problem of low modeling efficiency and feature loss of temporal modeling in human action recognition, we propose a human action recognition method based on Motion Excitation and Temporal Aggregation module (META). The method can capture multi-state and multi-scale temporal information...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ye, Qing, Tan, Zexian, Zhang, Yongmei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9647446/ https://www.ncbi.nlm.nih.gov/pubmed/36387431 http://dx.doi.org/10.1016/j.heliyon.2022.e11401

_version_	1784827384461000704
author	Ye, Qing Tan, Zexian Zhang, Yongmei
author_facet	Ye, Qing Tan, Zexian Zhang, Yongmei
author_sort	Ye, Qing
collection	PubMed
description	Aiming at the problem of low modeling efficiency and feature loss of temporal modeling in human action recognition, we propose a human action recognition method based on Motion Excitation and Temporal Aggregation module (META). The method can capture multi-state and multi-scale temporal information to achieve effective motion excitation. Firstly, temporal relational sampling is performed on video frames; Secondly, META is proposed to capture multi-state and multi-scale temporal information. META is composed of Multi-scale Motion Excitation module (MME) and Squeeze and Excitation Temporal Aggregation module (SETA). MME captures the feature level temporal difference by transforming the features into the temporal channel, which directly establishes the relationship between features and temporal channel, and solves the problem of low modeling efficiency. SETA transforms the local convolution into a set of sub-convolutions. Multiple sub-convolutions form hierarchies to extract features together and share the results of the upper convolutional layer, which increases the final temporal receptive field and solves the problem of feature loss. Moreover, the optical flow features are extracted through Cross modality pre-training to improve the utilization of temporal information. Finally, the result of human action recognition is carried out by combining spatiotemporal two stream features. Experimental results show that the accuracy of this method in UCF101 and HMDB-51 is 96.0% and 71.2% respectively, which is higher than other studies in the same period.
format	Online Article Text
id	pubmed-9647446
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-96474462022-11-15 Human action recognition method based on Motion Excitation and Temporal Aggregation module Ye, Qing Tan, Zexian Zhang, Yongmei Heliyon Research Article Aiming at the problem of low modeling efficiency and feature loss of temporal modeling in human action recognition, we propose a human action recognition method based on Motion Excitation and Temporal Aggregation module (META). The method can capture multi-state and multi-scale temporal information to achieve effective motion excitation. Firstly, temporal relational sampling is performed on video frames; Secondly, META is proposed to capture multi-state and multi-scale temporal information. META is composed of Multi-scale Motion Excitation module (MME) and Squeeze and Excitation Temporal Aggregation module (SETA). MME captures the feature level temporal difference by transforming the features into the temporal channel, which directly establishes the relationship between features and temporal channel, and solves the problem of low modeling efficiency. SETA transforms the local convolution into a set of sub-convolutions. Multiple sub-convolutions form hierarchies to extract features together and share the results of the upper convolutional layer, which increases the final temporal receptive field and solves the problem of feature loss. Moreover, the optical flow features are extracted through Cross modality pre-training to improve the utilization of temporal information. Finally, the result of human action recognition is carried out by combining spatiotemporal two stream features. Experimental results show that the accuracy of this method in UCF101 and HMDB-51 is 96.0% and 71.2% respectively, which is higher than other studies in the same period. Elsevier 2022-11-04 /pmc/articles/PMC9647446/ /pubmed/36387431 http://dx.doi.org/10.1016/j.heliyon.2022.e11401 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Research Article Ye, Qing Tan, Zexian Zhang, Yongmei Human action recognition method based on Motion Excitation and Temporal Aggregation module
title	Human action recognition method based on Motion Excitation and Temporal Aggregation module
title_full	Human action recognition method based on Motion Excitation and Temporal Aggregation module
title_fullStr	Human action recognition method based on Motion Excitation and Temporal Aggregation module
title_full_unstemmed	Human action recognition method based on Motion Excitation and Temporal Aggregation module
title_short	Human action recognition method based on Motion Excitation and Temporal Aggregation module
title_sort	human action recognition method based on motion excitation and temporal aggregation module
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9647446/ https://www.ncbi.nlm.nih.gov/pubmed/36387431 http://dx.doi.org/10.1016/j.heliyon.2022.e11401
work_keys_str_mv	AT yeqing humanactionrecognitionmethodbasedonmotionexcitationandtemporalaggregationmodule AT tanzexian humanactionrecognitionmethodbasedonmotionexcitationandtemporalaggregationmodule AT zhangyongmei humanactionrecognitionmethodbasedonmotionexcitationandtemporalaggregationmodule

Human action recognition method based on Motion Excitation and Temporal Aggregation module

Ejemplares similares