Cargando…

STAC: Spatial-Temporal Attention on Compensation Information for Activity Recognition in FPV

Egocentric activity recognition in first-person video (FPV) requires fine-grained matching of the camera wearer’s action and the objects being operated. The traditional method used for third-person action recognition does not suffice because of (1) the background ego-noise introduced by the unstruct...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Yue, Sun, Shengli, Lei, Linjian, Liu, Huikai, Xie, Hui
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7914484/ https://www.ncbi.nlm.nih.gov/pubmed/33562612 http://dx.doi.org/10.3390/s21041106

_version_	1783657013330313216
author	Zhang, Yue Sun, Shengli Lei, Linjian Liu, Huikai Xie, Hui
author_facet	Zhang, Yue Sun, Shengli Lei, Linjian Liu, Huikai Xie, Hui
author_sort	Zhang, Yue
collection	PubMed
description	Egocentric activity recognition in first-person video (FPV) requires fine-grained matching of the camera wearer’s action and the objects being operated. The traditional method used for third-person action recognition does not suffice because of (1) the background ego-noise introduced by the unstructured movement of the wearable devices caused by body movement; (2) the small-sized and fine-grained objects with single scale in FPV. Size compensation is performed to augment the data. It generates a multi-scale set of regions, including multi-size objects, leading to superior performance. We compensate for the optical flow to eliminate the camera noise in motion. We developed a novel two-stream convolutional neural network-recurrent attention neural network (CNN-RAN) architecture: spatial temporal attention on compensation information (STAC), able to generate generic descriptors under weak supervision and focus on the locations of activated objects and the capture of effective motion. We encode the RGB features using a spatial location-aware attention mechanism to guide the representation of visual features. Similar location-aware channel attention is applied to the temporal stream in the form of stacked optical flow to implicitly select the relevant frames and pay attention to where the action occurs. The two streams are complementary since one is object-centric and the other focuses on the motion. We conducted extensive ablation analysis to validate the complementarity and effectiveness of our STAC model qualitatively and quantitatively. It achieved state-of-the-art performance on two egocentric datasets.
format	Online Article Text
id	pubmed-7914484
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-79144842021-03-01 STAC: Spatial-Temporal Attention on Compensation Information for Activity Recognition in FPV Zhang, Yue Sun, Shengli Lei, Linjian Liu, Huikai Xie, Hui Sensors (Basel) Article Egocentric activity recognition in first-person video (FPV) requires fine-grained matching of the camera wearer’s action and the objects being operated. The traditional method used for third-person action recognition does not suffice because of (1) the background ego-noise introduced by the unstructured movement of the wearable devices caused by body movement; (2) the small-sized and fine-grained objects with single scale in FPV. Size compensation is performed to augment the data. It generates a multi-scale set of regions, including multi-size objects, leading to superior performance. We compensate for the optical flow to eliminate the camera noise in motion. We developed a novel two-stream convolutional neural network-recurrent attention neural network (CNN-RAN) architecture: spatial temporal attention on compensation information (STAC), able to generate generic descriptors under weak supervision and focus on the locations of activated objects and the capture of effective motion. We encode the RGB features using a spatial location-aware attention mechanism to guide the representation of visual features. Similar location-aware channel attention is applied to the temporal stream in the form of stacked optical flow to implicitly select the relevant frames and pay attention to where the action occurs. The two streams are complementary since one is object-centric and the other focuses on the motion. We conducted extensive ablation analysis to validate the complementarity and effectiveness of our STAC model qualitatively and quantitatively. It achieved state-of-the-art performance on two egocentric datasets. MDPI 2021-02-05 /pmc/articles/PMC7914484/ /pubmed/33562612 http://dx.doi.org/10.3390/s21041106 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Zhang, Yue Sun, Shengli Lei, Linjian Liu, Huikai Xie, Hui STAC: Spatial-Temporal Attention on Compensation Information for Activity Recognition in FPV
title	STAC: Spatial-Temporal Attention on Compensation Information for Activity Recognition in FPV
title_full	STAC: Spatial-Temporal Attention on Compensation Information for Activity Recognition in FPV
title_fullStr	STAC: Spatial-Temporal Attention on Compensation Information for Activity Recognition in FPV
title_full_unstemmed	STAC: Spatial-Temporal Attention on Compensation Information for Activity Recognition in FPV
title_short	STAC: Spatial-Temporal Attention on Compensation Information for Activity Recognition in FPV
title_sort	stac: spatial-temporal attention on compensation information for activity recognition in fpv
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7914484/ https://www.ncbi.nlm.nih.gov/pubmed/33562612 http://dx.doi.org/10.3390/s21041106
work_keys_str_mv	AT zhangyue stacspatialtemporalattentiononcompensationinformationforactivityrecognitioninfpv AT sunshengli stacspatialtemporalattentiononcompensationinformationforactivityrecognitioninfpv AT leilinjian stacspatialtemporalattentiononcompensationinformationforactivityrecognitioninfpv AT liuhuikai stacspatialtemporalattentiononcompensationinformationforactivityrecognitioninfpv AT xiehui stacspatialtemporalattentiononcompensationinformationforactivityrecognitioninfpv

STAC: Spatial-Temporal Attention on Compensation Information for Activity Recognition in FPV

Ejemplares similares