Cargando…

3D network with channel excitation and knowledge distillation for action recognition

Modern action recognition techniques frequently employ two networks: the spatial stream, which accepts input from RGB frames, and the temporal stream, which accepts input from optical flow. Recent researches use 3D convolutional neural networks that employ spatiotemporal filters on both streams. Alt...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hu, Zhengping, Mao, Jianzeng, Yao, Jianxin, Bi, Shuai
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10076829/ https://www.ncbi.nlm.nih.gov/pubmed/37033413 http://dx.doi.org/10.3389/fnbot.2023.1050167

_version_	1785020221301456896
author	Hu, Zhengping Mao, Jianzeng Yao, Jianxin Bi, Shuai
author_facet	Hu, Zhengping Mao, Jianzeng Yao, Jianxin Bi, Shuai
author_sort	Hu, Zhengping
collection	PubMed
description	Modern action recognition techniques frequently employ two networks: the spatial stream, which accepts input from RGB frames, and the temporal stream, which accepts input from optical flow. Recent researches use 3D convolutional neural networks that employ spatiotemporal filters on both streams. Although mixing flow with RGB enhances performance, correct optical flow computation is expensive and adds delay to action recognition. In this study, we present a method for training a 3D CNN using RGB frames that replicates the motion stream and, as a result, does not require flow calculation during testing. To begin, in contrast to the SE block, we suggest a channel excitation module (CE module). Experiments have shown that the CE module can improve the feature extraction capabilities of a 3D network and that the effect is superior to the SE block. Second, for action recognition training, we adopt a linear mix of loss based on knowledge distillation and standard cross-entropy loss to effectively leverage appearance and motion information. The Intensified Motion RGB Stream is the stream trained with this combined loss (IMRS). IMRS surpasses RGB or Flow as a single stream; for example, HMDB51 achieves 73.5% accuracy, while RGB and Flow streams score 65.6% and 69.1% accuracy, respectively. Extensive experiments confirm the effectiveness of our proposed method. The comparison with other models proves that our model has good competitiveness in behavior recognition.
format	Online Article Text
id	pubmed-10076829
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-100768292023-04-07 3D network with channel excitation and knowledge distillation for action recognition Hu, Zhengping Mao, Jianzeng Yao, Jianxin Bi, Shuai Front Neurorobot Neuroscience Modern action recognition techniques frequently employ two networks: the spatial stream, which accepts input from RGB frames, and the temporal stream, which accepts input from optical flow. Recent researches use 3D convolutional neural networks that employ spatiotemporal filters on both streams. Although mixing flow with RGB enhances performance, correct optical flow computation is expensive and adds delay to action recognition. In this study, we present a method for training a 3D CNN using RGB frames that replicates the motion stream and, as a result, does not require flow calculation during testing. To begin, in contrast to the SE block, we suggest a channel excitation module (CE module). Experiments have shown that the CE module can improve the feature extraction capabilities of a 3D network and that the effect is superior to the SE block. Second, for action recognition training, we adopt a linear mix of loss based on knowledge distillation and standard cross-entropy loss to effectively leverage appearance and motion information. The Intensified Motion RGB Stream is the stream trained with this combined loss (IMRS). IMRS surpasses RGB or Flow as a single stream; for example, HMDB51 achieves 73.5% accuracy, while RGB and Flow streams score 65.6% and 69.1% accuracy, respectively. Extensive experiments confirm the effectiveness of our proposed method. The comparison with other models proves that our model has good competitiveness in behavior recognition. Frontiers Media S.A. 2023-03-23 /pmc/articles/PMC10076829/ /pubmed/37033413 http://dx.doi.org/10.3389/fnbot.2023.1050167 Text en Copyright © 2023 Hu, Mao, Yao and Bi. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Hu, Zhengping Mao, Jianzeng Yao, Jianxin Bi, Shuai 3D network with channel excitation and knowledge distillation for action recognition
title	3D network with channel excitation and knowledge distillation for action recognition
title_full	3D network with channel excitation and knowledge distillation for action recognition
title_fullStr	3D network with channel excitation and knowledge distillation for action recognition
title_full_unstemmed	3D network with channel excitation and knowledge distillation for action recognition
title_short	3D network with channel excitation and knowledge distillation for action recognition
title_sort	3d network with channel excitation and knowledge distillation for action recognition
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10076829/ https://www.ncbi.nlm.nih.gov/pubmed/37033413 http://dx.doi.org/10.3389/fnbot.2023.1050167
work_keys_str_mv	AT huzhengping 3dnetworkwithchannelexcitationandknowledgedistillationforactionrecognition AT maojianzeng 3dnetworkwithchannelexcitationandknowledgedistillationforactionrecognition AT yaojianxin 3dnetworkwithchannelexcitationandknowledgedistillationforactionrecognition AT bishuai 3dnetworkwithchannelexcitationandknowledgedistillationforactionrecognition

3D network with channel excitation and knowledge distillation for action recognition

Ejemplares similares