Cargando…

Spatiotemporal Interaction Residual Networks with Pseudo3D for Video Action Recognition

Action recognition is a significant and challenging topic in the field of sensor and computer vision. Two-stream convolutional neural networks (CNNs) and 3D CNNs are two mainstream deep learning architectures for video action recognition. To combine them into one framework to further improve perform...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Jianyu, Kong, Jun, Sun, Hui, Xu, Hui, Liu, Xiaoli, Lu, Yinghua, Zheng, Caixia
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7308980/ https://www.ncbi.nlm.nih.gov/pubmed/32492842 http://dx.doi.org/10.3390/s20113126

_version_	1783549117669048320
author	Chen, Jianyu Kong, Jun Sun, Hui Xu, Hui Liu, Xiaoli Lu, Yinghua Zheng, Caixia
author_facet	Chen, Jianyu Kong, Jun Sun, Hui Xu, Hui Liu, Xiaoli Lu, Yinghua Zheng, Caixia
author_sort	Chen, Jianyu
collection	PubMed
description	Action recognition is a significant and challenging topic in the field of sensor and computer vision. Two-stream convolutional neural networks (CNNs) and 3D CNNs are two mainstream deep learning architectures for video action recognition. To combine them into one framework to further improve performance, we proposed a novel deep network, named the spatiotemporal interaction residual network with pseudo3D (STINP). The STINP possesses three advantages. First, the STINP consists of two branches constructed based on residual networks (ResNets) to simultaneously learn the spatial and temporal information of the video. Second, the STINP integrates the pseudo3D block into residual units for building the spatial branch, which ensures that the spatial branch can not only learn the appearance feature of the objects and scene in the video, but also capture the potential interaction information among the consecutive frames. Finally, the STINP adopts a simple but effective multiplication operation to fuse the spatial branch and temporal branch, which guarantees that the learned spatial and temporal representation can interact with each other during the entire process of training the STINP. Experiments were implemented on two classic action recognition datasets, UCF101 and HMDB51. The experimental results show that our proposed STINP can provide better performance for video recognition than other state-of-the-art algorithms.
format	Online Article Text
id	pubmed-7308980
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-73089802020-06-25 Spatiotemporal Interaction Residual Networks with Pseudo3D for Video Action Recognition Chen, Jianyu Kong, Jun Sun, Hui Xu, Hui Liu, Xiaoli Lu, Yinghua Zheng, Caixia Sensors (Basel) Article Action recognition is a significant and challenging topic in the field of sensor and computer vision. Two-stream convolutional neural networks (CNNs) and 3D CNNs are two mainstream deep learning architectures for video action recognition. To combine them into one framework to further improve performance, we proposed a novel deep network, named the spatiotemporal interaction residual network with pseudo3D (STINP). The STINP possesses three advantages. First, the STINP consists of two branches constructed based on residual networks (ResNets) to simultaneously learn the spatial and temporal information of the video. Second, the STINP integrates the pseudo3D block into residual units for building the spatial branch, which ensures that the spatial branch can not only learn the appearance feature of the objects and scene in the video, but also capture the potential interaction information among the consecutive frames. Finally, the STINP adopts a simple but effective multiplication operation to fuse the spatial branch and temporal branch, which guarantees that the learned spatial and temporal representation can interact with each other during the entire process of training the STINP. Experiments were implemented on two classic action recognition datasets, UCF101 and HMDB51. The experimental results show that our proposed STINP can provide better performance for video recognition than other state-of-the-art algorithms. MDPI 2020-06-01 /pmc/articles/PMC7308980/ /pubmed/32492842 http://dx.doi.org/10.3390/s20113126 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Chen, Jianyu Kong, Jun Sun, Hui Xu, Hui Liu, Xiaoli Lu, Yinghua Zheng, Caixia Spatiotemporal Interaction Residual Networks with Pseudo3D for Video Action Recognition
title	Spatiotemporal Interaction Residual Networks with Pseudo3D for Video Action Recognition
title_full	Spatiotemporal Interaction Residual Networks with Pseudo3D for Video Action Recognition
title_fullStr	Spatiotemporal Interaction Residual Networks with Pseudo3D for Video Action Recognition
title_full_unstemmed	Spatiotemporal Interaction Residual Networks with Pseudo3D for Video Action Recognition
title_short	Spatiotemporal Interaction Residual Networks with Pseudo3D for Video Action Recognition
title_sort	spatiotemporal interaction residual networks with pseudo3d for video action recognition
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7308980/ https://www.ncbi.nlm.nih.gov/pubmed/32492842 http://dx.doi.org/10.3390/s20113126
work_keys_str_mv	AT chenjianyu spatiotemporalinteractionresidualnetworkswithpseudo3dforvideoactionrecognition AT kongjun spatiotemporalinteractionresidualnetworkswithpseudo3dforvideoactionrecognition AT sunhui spatiotemporalinteractionresidualnetworkswithpseudo3dforvideoactionrecognition AT xuhui spatiotemporalinteractionresidualnetworkswithpseudo3dforvideoactionrecognition AT liuxiaoli spatiotemporalinteractionresidualnetworkswithpseudo3dforvideoactionrecognition AT luyinghua spatiotemporalinteractionresidualnetworkswithpseudo3dforvideoactionrecognition AT zhengcaixia spatiotemporalinteractionresidualnetworkswithpseudo3dforvideoactionrecognition

Spatiotemporal Interaction Residual Networks with Pseudo3D for Video Action Recognition

Ejemplares similares