Cargando…

A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera

We present a deep learning-based multitask framework for joint 3D human pose estimation and action recognition from RGB sensors using simple cameras. The approach proceeds along two stages. In the first, a real-time 2D pose detector is run to determine the precise pixel location of important keypoin...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pham, Huy Hieu, Salmane, Houssam, Khoudour, Louahdi, Crouzil, Alain, Velastin, Sergio A., Zegers, Pablo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7180926/ https://www.ncbi.nlm.nih.gov/pubmed/32218350 http://dx.doi.org/10.3390/s20071825

_version_	1783525933960921088
author	Pham, Huy Hieu Salmane, Houssam Khoudour, Louahdi Crouzil, Alain Velastin, Sergio A. Zegers, Pablo
author_facet	Pham, Huy Hieu Salmane, Houssam Khoudour, Louahdi Crouzil, Alain Velastin, Sergio A. Zegers, Pablo
author_sort	Pham, Huy Hieu
collection	PubMed
description	We present a deep learning-based multitask framework for joint 3D human pose estimation and action recognition from RGB sensors using simple cameras. The approach proceeds along two stages. In the first, a real-time 2D pose detector is run to determine the precise pixel location of important keypoints of the human body. A two-stream deep neural network is then designed and trained to map detected 2D keypoints into 3D poses. In the second stage, the Efficient Neural Architecture Search (ENAS) algorithm is deployed to find an optimal network architecture that is used for modeling the spatio-temporal evolution of the estimated 3D poses via an image-based intermediate representation and performing action recognition. Experiments on Human3.6M, MSR Action3D and SBU Kinect Interaction datasets verify the effectiveness of the proposed method on the targeted tasks. Moreover, we show that the method requires a low computational budget for training and inference. In particular, the experimental results show that by using a monocular RGB sensor, we can develop a 3D pose estimation and human action recognition approach that reaches the performance of RGB-depth sensors. This opens up many opportunities for leveraging RGB cameras (which are much cheaper than depth cameras and extensively deployed in private and public places) to build intelligent recognition systems.
format	Online Article Text
id	pubmed-7180926
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-71809262020-04-30 A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera Pham, Huy Hieu Salmane, Houssam Khoudour, Louahdi Crouzil, Alain Velastin, Sergio A. Zegers, Pablo Sensors (Basel) Article We present a deep learning-based multitask framework for joint 3D human pose estimation and action recognition from RGB sensors using simple cameras. The approach proceeds along two stages. In the first, a real-time 2D pose detector is run to determine the precise pixel location of important keypoints of the human body. A two-stream deep neural network is then designed and trained to map detected 2D keypoints into 3D poses. In the second stage, the Efficient Neural Architecture Search (ENAS) algorithm is deployed to find an optimal network architecture that is used for modeling the spatio-temporal evolution of the estimated 3D poses via an image-based intermediate representation and performing action recognition. Experiments on Human3.6M, MSR Action3D and SBU Kinect Interaction datasets verify the effectiveness of the proposed method on the targeted tasks. Moreover, we show that the method requires a low computational budget for training and inference. In particular, the experimental results show that by using a monocular RGB sensor, we can develop a 3D pose estimation and human action recognition approach that reaches the performance of RGB-depth sensors. This opens up many opportunities for leveraging RGB cameras (which are much cheaper than depth cameras and extensively deployed in private and public places) to build intelligent recognition systems. MDPI 2020-03-25 /pmc/articles/PMC7180926/ /pubmed/32218350 http://dx.doi.org/10.3390/s20071825 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Pham, Huy Hieu Salmane, Houssam Khoudour, Louahdi Crouzil, Alain Velastin, Sergio A. Zegers, Pablo A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera
title	A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera
title_full	A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera
title_fullStr	A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera
title_full_unstemmed	A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera
title_short	A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera
title_sort	unified deep framework for joint 3d pose estimation and action recognition from a single rgb camera
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7180926/ https://www.ncbi.nlm.nih.gov/pubmed/32218350 http://dx.doi.org/10.3390/s20071825
work_keys_str_mv	AT phamhuyhieu aunifieddeepframeworkforjoint3dposeestimationandactionrecognitionfromasinglergbcamera AT salmanehoussam aunifieddeepframeworkforjoint3dposeestimationandactionrecognitionfromasinglergbcamera AT khoudourlouahdi aunifieddeepframeworkforjoint3dposeestimationandactionrecognitionfromasinglergbcamera AT crouzilalain aunifieddeepframeworkforjoint3dposeestimationandactionrecognitionfromasinglergbcamera AT velastinsergioa aunifieddeepframeworkforjoint3dposeestimationandactionrecognitionfromasinglergbcamera AT zegerspablo aunifieddeepframeworkforjoint3dposeestimationandactionrecognitionfromasinglergbcamera AT phamhuyhieu unifieddeepframeworkforjoint3dposeestimationandactionrecognitionfromasinglergbcamera AT salmanehoussam unifieddeepframeworkforjoint3dposeestimationandactionrecognitionfromasinglergbcamera AT khoudourlouahdi unifieddeepframeworkforjoint3dposeestimationandactionrecognitionfromasinglergbcamera AT crouzilalain unifieddeepframeworkforjoint3dposeestimationandactionrecognitionfromasinglergbcamera AT velastinsergioa unifieddeepframeworkforjoint3dposeestimationandactionrecognitionfromasinglergbcamera AT zegerspablo unifieddeepframeworkforjoint3dposeestimationandactionrecognitionfromasinglergbcamera

A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera

Ejemplares similares