Cargando…
MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module
As a sub-field of video content analysis, action recognition has received extensive attention in recent years, which aims to recognize human actions in videos. Compared with a single image, video has a temporal dimension. Therefore, it is of great significance to extract the spatio-temporal informat...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9460449/ https://www.ncbi.nlm.nih.gov/pubmed/36081054 http://dx.doi.org/10.3390/s22176595 |
_version_ | 1784786750388830208 |
---|---|
author | Zhang, Yi |
author_facet | Zhang, Yi |
author_sort | Zhang, Yi |
collection | PubMed |
description | As a sub-field of video content analysis, action recognition has received extensive attention in recent years, which aims to recognize human actions in videos. Compared with a single image, video has a temporal dimension. Therefore, it is of great significance to extract the spatio-temporal information from videos for action recognition. In this paper, an efficient network to extract spatio-temporal information with relatively low computational load (dubbed MEST) is proposed. Firstly, a motion encoder to capture short-term motion cues between consecutive frames is developed, followed by a channel-wise spatio-temporal module to model long-term feature information. Moreover, the weight standardization method is applied to the convolution layers followed by batch normalization layers to expedite the training process and facilitate convergence. Experiments are conducted on five public datasets of action recognition, Something-Something-V1 and -V2, Jester, UCF101 and HMDB51, where MEST exhibits competitive performance compared to other popular methods. The results demonstrate the effectiveness of our network in terms of accuracy, computational cost and network scales. |
format | Online Article Text |
id | pubmed-9460449 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-94604492022-09-10 MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module Zhang, Yi Sensors (Basel) Article As a sub-field of video content analysis, action recognition has received extensive attention in recent years, which aims to recognize human actions in videos. Compared with a single image, video has a temporal dimension. Therefore, it is of great significance to extract the spatio-temporal information from videos for action recognition. In this paper, an efficient network to extract spatio-temporal information with relatively low computational load (dubbed MEST) is proposed. Firstly, a motion encoder to capture short-term motion cues between consecutive frames is developed, followed by a channel-wise spatio-temporal module to model long-term feature information. Moreover, the weight standardization method is applied to the convolution layers followed by batch normalization layers to expedite the training process and facilitate convergence. Experiments are conducted on five public datasets of action recognition, Something-Something-V1 and -V2, Jester, UCF101 and HMDB51, where MEST exhibits competitive performance compared to other popular methods. The results demonstrate the effectiveness of our network in terms of accuracy, computational cost and network scales. MDPI 2022-09-01 /pmc/articles/PMC9460449/ /pubmed/36081054 http://dx.doi.org/10.3390/s22176595 Text en © 2022 by the author. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Zhang, Yi MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module |
title | MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module |
title_full | MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module |
title_fullStr | MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module |
title_full_unstemmed | MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module |
title_short | MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module |
title_sort | mest: an action recognition network with motion encoder and spatio-temporal module |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9460449/ https://www.ncbi.nlm.nih.gov/pubmed/36081054 http://dx.doi.org/10.3390/s22176595 |
work_keys_str_mv | AT zhangyi mestanactionrecognitionnetworkwithmotionencoderandspatiotemporalmodule |