Cargando…

MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module

As a sub-field of video content analysis, action recognition has received extensive attention in recent years, which aims to recognize human actions in videos. Compared with a single image, video has a temporal dimension. Therefore, it is of great significance to extract the spatio-temporal informat...

Descripción completa

Detalles Bibliográficos
Autor principal: Zhang, Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9460449/
https://www.ncbi.nlm.nih.gov/pubmed/36081054
http://dx.doi.org/10.3390/s22176595
_version_ 1784786750388830208
author Zhang, Yi
author_facet Zhang, Yi
author_sort Zhang, Yi
collection PubMed
description As a sub-field of video content analysis, action recognition has received extensive attention in recent years, which aims to recognize human actions in videos. Compared with a single image, video has a temporal dimension. Therefore, it is of great significance to extract the spatio-temporal information from videos for action recognition. In this paper, an efficient network to extract spatio-temporal information with relatively low computational load (dubbed MEST) is proposed. Firstly, a motion encoder to capture short-term motion cues between consecutive frames is developed, followed by a channel-wise spatio-temporal module to model long-term feature information. Moreover, the weight standardization method is applied to the convolution layers followed by batch normalization layers to expedite the training process and facilitate convergence. Experiments are conducted on five public datasets of action recognition, Something-Something-V1 and -V2, Jester, UCF101 and HMDB51, where MEST exhibits competitive performance compared to other popular methods. The results demonstrate the effectiveness of our network in terms of accuracy, computational cost and network scales.
format Online
Article
Text
id pubmed-9460449
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-94604492022-09-10 MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module Zhang, Yi Sensors (Basel) Article As a sub-field of video content analysis, action recognition has received extensive attention in recent years, which aims to recognize human actions in videos. Compared with a single image, video has a temporal dimension. Therefore, it is of great significance to extract the spatio-temporal information from videos for action recognition. In this paper, an efficient network to extract spatio-temporal information with relatively low computational load (dubbed MEST) is proposed. Firstly, a motion encoder to capture short-term motion cues between consecutive frames is developed, followed by a channel-wise spatio-temporal module to model long-term feature information. Moreover, the weight standardization method is applied to the convolution layers followed by batch normalization layers to expedite the training process and facilitate convergence. Experiments are conducted on five public datasets of action recognition, Something-Something-V1 and -V2, Jester, UCF101 and HMDB51, where MEST exhibits competitive performance compared to other popular methods. The results demonstrate the effectiveness of our network in terms of accuracy, computational cost and network scales. MDPI 2022-09-01 /pmc/articles/PMC9460449/ /pubmed/36081054 http://dx.doi.org/10.3390/s22176595 Text en © 2022 by the author. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhang, Yi
MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module
title MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module
title_full MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module
title_fullStr MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module
title_full_unstemmed MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module
title_short MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module
title_sort mest: an action recognition network with motion encoder and spatio-temporal module
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9460449/
https://www.ncbi.nlm.nih.gov/pubmed/36081054
http://dx.doi.org/10.3390/s22176595
work_keys_str_mv AT zhangyi mestanactionrecognitionnetworkwithmotionencoderandspatiotemporalmodule