Cargando…

Fusion of Appearance and Motion Features for Daily Activity Recognition from Egocentric Perspective

Vidos from a first-person or egocentric perspective offer a promising tool for recognizing various activities related to daily living. In the egocentric perspective, the video is obtained from a wearable camera, and this enables the capture of the person’s activities in a consistent viewpoint. Recog...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lye, Mohd Haris, AlDahoul, Nouar, Abdul Karim, Hezerul
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10422501/ https://www.ncbi.nlm.nih.gov/pubmed/37571588 http://dx.doi.org/10.3390/s23156804

_version_	1785089225937387520
author	Lye, Mohd Haris AlDahoul, Nouar Abdul Karim, Hezerul
author_facet	Lye, Mohd Haris AlDahoul, Nouar Abdul Karim, Hezerul
author_sort	Lye, Mohd Haris
collection	PubMed
description	Vidos from a first-person or egocentric perspective offer a promising tool for recognizing various activities related to daily living. In the egocentric perspective, the video is obtained from a wearable camera, and this enables the capture of the person’s activities in a consistent viewpoint. Recognition of activity using a wearable sensor is challenging due to various reasons, such as motion blur and large variations. The existing methods are based on extracting handcrafted features from video frames to represent the contents. These features are domain-dependent, where features that are suitable for a specific dataset may not be suitable for others. In this paper, we propose a novel solution to recognize daily living activities from a pre-segmented video clip. The pre-trained convolutional neural network (CNN) model VGG16 is used to extract visual features from sampled video frames and then aggregated by the proposed pooling scheme. The proposed solution combines appearance and motion features extracted from video frames and optical flow images, respectively. The methods of mean and max spatial pooling (MMSP) and max mean temporal pyramid (TPMM) pooling are proposed to compose the final video descriptor. The feature is applied to a linear support vector machine (SVM) to recognize the type of activities observed in the video clip. The evaluation of the proposed solution was performed on three public benchmark datasets. We performed studies to show the advantage of aggregating appearance and motion features for daily activity recognition. The results show that the proposed solution is promising for recognizing activities of daily living. Compared to several methods on three public datasets, the proposed MMSP–TPMM method produces higher classification performance in terms of accuracy (90.38% with LENA dataset, 75.37% with ADL dataset, 96.08% with FPPA dataset) and average per-class precision (AP) (58.42% with ADL dataset and 96.11% with FPPA dataset).
format	Online Article Text
id	pubmed-10422501
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-104225012023-08-13 Fusion of Appearance and Motion Features for Daily Activity Recognition from Egocentric Perspective Lye, Mohd Haris AlDahoul, Nouar Abdul Karim, Hezerul Sensors (Basel) Article Vidos from a first-person or egocentric perspective offer a promising tool for recognizing various activities related to daily living. In the egocentric perspective, the video is obtained from a wearable camera, and this enables the capture of the person’s activities in a consistent viewpoint. Recognition of activity using a wearable sensor is challenging due to various reasons, such as motion blur and large variations. The existing methods are based on extracting handcrafted features from video frames to represent the contents. These features are domain-dependent, where features that are suitable for a specific dataset may not be suitable for others. In this paper, we propose a novel solution to recognize daily living activities from a pre-segmented video clip. The pre-trained convolutional neural network (CNN) model VGG16 is used to extract visual features from sampled video frames and then aggregated by the proposed pooling scheme. The proposed solution combines appearance and motion features extracted from video frames and optical flow images, respectively. The methods of mean and max spatial pooling (MMSP) and max mean temporal pyramid (TPMM) pooling are proposed to compose the final video descriptor. The feature is applied to a linear support vector machine (SVM) to recognize the type of activities observed in the video clip. The evaluation of the proposed solution was performed on three public benchmark datasets. We performed studies to show the advantage of aggregating appearance and motion features for daily activity recognition. The results show that the proposed solution is promising for recognizing activities of daily living. Compared to several methods on three public datasets, the proposed MMSP–TPMM method produces higher classification performance in terms of accuracy (90.38% with LENA dataset, 75.37% with ADL dataset, 96.08% with FPPA dataset) and average per-class precision (AP) (58.42% with ADL dataset and 96.11% with FPPA dataset). MDPI 2023-07-30 /pmc/articles/PMC10422501/ /pubmed/37571588 http://dx.doi.org/10.3390/s23156804 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Lye, Mohd Haris AlDahoul, Nouar Abdul Karim, Hezerul Fusion of Appearance and Motion Features for Daily Activity Recognition from Egocentric Perspective
title	Fusion of Appearance and Motion Features for Daily Activity Recognition from Egocentric Perspective
title_full	Fusion of Appearance and Motion Features for Daily Activity Recognition from Egocentric Perspective
title_fullStr	Fusion of Appearance and Motion Features for Daily Activity Recognition from Egocentric Perspective
title_full_unstemmed	Fusion of Appearance and Motion Features for Daily Activity Recognition from Egocentric Perspective
title_short	Fusion of Appearance and Motion Features for Daily Activity Recognition from Egocentric Perspective
title_sort	fusion of appearance and motion features for daily activity recognition from egocentric perspective
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10422501/ https://www.ncbi.nlm.nih.gov/pubmed/37571588 http://dx.doi.org/10.3390/s23156804
work_keys_str_mv	AT lyemohdharis fusionofappearanceandmotionfeaturesfordailyactivityrecognitionfromegocentricperspective AT aldahoulnouar fusionofappearanceandmotionfeaturesfordailyactivityrecognitionfromegocentricperspective AT abdulkarimhezerul fusionofappearanceandmotionfeaturesfordailyactivityrecognitionfromegocentricperspective

Fusion of Appearance and Motion Features for Daily Activity Recognition from Egocentric Perspective

Ejemplares similares