Cargando…

Generalized Pose Decoupled Network for Unsupervised 3D Skeleton Sequence-Based Action Representation Learning

Human action representation is derived from the description of human shape and motion. The traditional unsupervised 3-dimensional (3D) human action representation learning method uses a recurrent neural network (RNN)-based autoencoder to reconstruct the input pose sequence and then takes the midleve...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Mengyuan, Meng, Fanyang, Liang, Yongsheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: AAAS 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10076048/
https://www.ncbi.nlm.nih.gov/pubmed/37040281
http://dx.doi.org/10.34133/cbsystems.0002
_version_ 1785020051292684288
author Liu, Mengyuan
Meng, Fanyang
Liang, Yongsheng
author_facet Liu, Mengyuan
Meng, Fanyang
Liang, Yongsheng
author_sort Liu, Mengyuan
collection PubMed
description Human action representation is derived from the description of human shape and motion. The traditional unsupervised 3-dimensional (3D) human action representation learning method uses a recurrent neural network (RNN)-based autoencoder to reconstruct the input pose sequence and then takes the midlevel feature of the autoencoder as representation. Although RNN can implicitly learn a certain amount of motion information, the extracted representation mainly describes the human shape and is insufficient to describe motion information. Therefore, we first present a handcrafted motion feature called pose flow to guide the reconstruction of the autoencoder, whose midlevel feature is expected to describe motion information. The performance is limited as we observe that actions can be distinctive in either motion direction or motion norm. For example, we can distinguish “sitting down” and “standing up” from motion direction yet distinguish “running” and “jogging” from motion norm. In these cases, it is difficult to learn distinctive features from pose flow where direction and norm are mixed. To this end, we present an explicit pose decoupled flow network (PDF-E) to learn from direction and norm in a multi-task learning framework, where 1 encoder is used to generate representation and 2 decoders are used to generating direction and norm, respectively. Further, we use reconstructing the input pose sequence as an additional constraint and present a generalized PDF network (PDF-G) to learn both motion and shape information, which achieves state-of-the-art performances on large-scale and challenging 3D action recognition datasets including the NTU RGB+D 60 dataset and NTU RGB+D 120 dataset.
format Online
Article
Text
id pubmed-10076048
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher AAAS
record_format MEDLINE/PubMed
spelling pubmed-100760482023-04-06 Generalized Pose Decoupled Network for Unsupervised 3D Skeleton Sequence-Based Action Representation Learning Liu, Mengyuan Meng, Fanyang Liang, Yongsheng Cyborg Bionic Syst Research Article Human action representation is derived from the description of human shape and motion. The traditional unsupervised 3-dimensional (3D) human action representation learning method uses a recurrent neural network (RNN)-based autoencoder to reconstruct the input pose sequence and then takes the midlevel feature of the autoencoder as representation. Although RNN can implicitly learn a certain amount of motion information, the extracted representation mainly describes the human shape and is insufficient to describe motion information. Therefore, we first present a handcrafted motion feature called pose flow to guide the reconstruction of the autoencoder, whose midlevel feature is expected to describe motion information. The performance is limited as we observe that actions can be distinctive in either motion direction or motion norm. For example, we can distinguish “sitting down” and “standing up” from motion direction yet distinguish “running” and “jogging” from motion norm. In these cases, it is difficult to learn distinctive features from pose flow where direction and norm are mixed. To this end, we present an explicit pose decoupled flow network (PDF-E) to learn from direction and norm in a multi-task learning framework, where 1 encoder is used to generate representation and 2 decoders are used to generating direction and norm, respectively. Further, we use reconstructing the input pose sequence as an additional constraint and present a generalized PDF network (PDF-G) to learn both motion and shape information, which achieves state-of-the-art performances on large-scale and challenging 3D action recognition datasets including the NTU RGB+D 60 dataset and NTU RGB+D 120 dataset. AAAS 2022-12-30 2022 /pmc/articles/PMC10076048/ /pubmed/37040281 http://dx.doi.org/10.34133/cbsystems.0002 Text en Copyright © 2022 Mengyuan Liu et al. https://creativecommons.org/licenses/by/4.0/Exclusive licensee Beijing Institute of Technology Press. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution License 4.0 (CC BY 4.0) (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Liu, Mengyuan
Meng, Fanyang
Liang, Yongsheng
Generalized Pose Decoupled Network for Unsupervised 3D Skeleton Sequence-Based Action Representation Learning
title Generalized Pose Decoupled Network for Unsupervised 3D Skeleton Sequence-Based Action Representation Learning
title_full Generalized Pose Decoupled Network for Unsupervised 3D Skeleton Sequence-Based Action Representation Learning
title_fullStr Generalized Pose Decoupled Network for Unsupervised 3D Skeleton Sequence-Based Action Representation Learning
title_full_unstemmed Generalized Pose Decoupled Network for Unsupervised 3D Skeleton Sequence-Based Action Representation Learning
title_short Generalized Pose Decoupled Network for Unsupervised 3D Skeleton Sequence-Based Action Representation Learning
title_sort generalized pose decoupled network for unsupervised 3d skeleton sequence-based action representation learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10076048/
https://www.ncbi.nlm.nih.gov/pubmed/37040281
http://dx.doi.org/10.34133/cbsystems.0002
work_keys_str_mv AT liumengyuan generalizedposedecouplednetworkforunsupervised3dskeletonsequencebasedactionrepresentationlearning
AT mengfanyang generalizedposedecouplednetworkforunsupervised3dskeletonsequencebasedactionrepresentationlearning
AT liangyongsheng generalizedposedecouplednetworkforunsupervised3dskeletonsequencebasedactionrepresentationlearning