Cargando…

STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video

Most deep learning-based action recognition models focus only on short-term motions, so the model often causes misjudgments of actions that are combined by multiple processes, such as long jump, high jump, etc. The proposal of Temporal Segment Networks (TSN) enables the network to capture long-term...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yang, Guoan, Yang, Yong, Lu, Zhengzhi, Yang, Junjie, Liu, Deyang, Zhou, Chuanbo, Fan, Zien
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8929560/ https://www.ncbi.nlm.nih.gov/pubmed/35298497 http://dx.doi.org/10.1371/journal.pone.0265115

_version_	1784670882257436672
author	Yang, Guoan Yang, Yong Lu, Zhengzhi Yang, Junjie Liu, Deyang Zhou, Chuanbo Fan, Zien
author_facet	Yang, Guoan Yang, Yong Lu, Zhengzhi Yang, Junjie Liu, Deyang Zhou, Chuanbo Fan, Zien
author_sort	Yang, Guoan
collection	PubMed
description	Most deep learning-based action recognition models focus only on short-term motions, so the model often causes misjudgments of actions that are combined by multiple processes, such as long jump, high jump, etc. The proposal of Temporal Segment Networks (TSN) enables the network to capture long-term information in the video, but ignores that some unrelated frames or areas in the video can also cause great interference to action recognition. To solve this problem, a soft attention mechanism is introduced in TSN and a Spatial-Temporal Attention Temporal Segment Networks (STA-TSN), which retains the ability to capture long-term information and enables the network to adaptively focus on key features in space and time, is proposed. First, a multi-scale spatial focus feature enhancement strategy is proposed to fuse original convolution features with multi-scale spatial focus features obtained through a soft attention mechanism with spatial pyramid pooling. Second, a deep learning-based key frames exploration module, which utilizes a soft attention mechanism based on Long-Short Term Memory (LSTM) to adaptively learn temporal attention weights, is designed. Third, a temporal-attention regularization is developed to guide our STA-TSN to better realize the exploration of key frames. Finally, the experimental results show that our proposed STA-TSN outperforms TSN in the four public datasets UCF101, HMDB51, JHMDB and THUMOS14, as well as achieves state-of-the-art results.
format	Online Article Text
id	pubmed-8929560
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-89295602022-03-18 STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video Yang, Guoan Yang, Yong Lu, Zhengzhi Yang, Junjie Liu, Deyang Zhou, Chuanbo Fan, Zien PLoS One Research Article Most deep learning-based action recognition models focus only on short-term motions, so the model often causes misjudgments of actions that are combined by multiple processes, such as long jump, high jump, etc. The proposal of Temporal Segment Networks (TSN) enables the network to capture long-term information in the video, but ignores that some unrelated frames or areas in the video can also cause great interference to action recognition. To solve this problem, a soft attention mechanism is introduced in TSN and a Spatial-Temporal Attention Temporal Segment Networks (STA-TSN), which retains the ability to capture long-term information and enables the network to adaptively focus on key features in space and time, is proposed. First, a multi-scale spatial focus feature enhancement strategy is proposed to fuse original convolution features with multi-scale spatial focus features obtained through a soft attention mechanism with spatial pyramid pooling. Second, a deep learning-based key frames exploration module, which utilizes a soft attention mechanism based on Long-Short Term Memory (LSTM) to adaptively learn temporal attention weights, is designed. Third, a temporal-attention regularization is developed to guide our STA-TSN to better realize the exploration of key frames. Finally, the experimental results show that our proposed STA-TSN outperforms TSN in the four public datasets UCF101, HMDB51, JHMDB and THUMOS14, as well as achieves state-of-the-art results. Public Library of Science 2022-03-17 /pmc/articles/PMC8929560/ /pubmed/35298497 http://dx.doi.org/10.1371/journal.pone.0265115 Text en © 2022 Yang et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Yang, Guoan Yang, Yong Lu, Zhengzhi Yang, Junjie Liu, Deyang Zhou, Chuanbo Fan, Zien STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video
title	STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video
title_full	STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video
title_fullStr	STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video
title_full_unstemmed	STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video
title_short	STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video
title_sort	sta-tsn: spatial-temporal attention temporal segment network for action recognition in video
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8929560/ https://www.ncbi.nlm.nih.gov/pubmed/35298497 http://dx.doi.org/10.1371/journal.pone.0265115
work_keys_str_mv	AT yangguoan statsnspatialtemporalattentiontemporalsegmentnetworkforactionrecognitioninvideo AT yangyong statsnspatialtemporalattentiontemporalsegmentnetworkforactionrecognitioninvideo AT luzhengzhi statsnspatialtemporalattentiontemporalsegmentnetworkforactionrecognitioninvideo AT yangjunjie statsnspatialtemporalattentiontemporalsegmentnetworkforactionrecognitioninvideo AT liudeyang statsnspatialtemporalattentiontemporalsegmentnetworkforactionrecognitioninvideo AT zhouchuanbo statsnspatialtemporalattentiontemporalsegmentnetworkforactionrecognitioninvideo AT fanzien statsnspatialtemporalattentiontemporalsegmentnetworkforactionrecognitioninvideo

STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video

Ejemplares similares