Cargando…
Generalized Pose Decoupled Network for Unsupervised 3D Skeleton Sequence-Based Action Representation Learning
Human action representation is derived from the description of human shape and motion. The traditional unsupervised 3-dimensional (3D) human action representation learning method uses a recurrent neural network (RNN)-based autoencoder to reconstruct the input pose sequence and then takes the midleve...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
AAAS
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10076048/ https://www.ncbi.nlm.nih.gov/pubmed/37040281 http://dx.doi.org/10.34133/cbsystems.0002 |
_version_ | 1785020051292684288 |
---|---|
author | Liu, Mengyuan Meng, Fanyang Liang, Yongsheng |
author_facet | Liu, Mengyuan Meng, Fanyang Liang, Yongsheng |
author_sort | Liu, Mengyuan |
collection | PubMed |
description | Human action representation is derived from the description of human shape and motion. The traditional unsupervised 3-dimensional (3D) human action representation learning method uses a recurrent neural network (RNN)-based autoencoder to reconstruct the input pose sequence and then takes the midlevel feature of the autoencoder as representation. Although RNN can implicitly learn a certain amount of motion information, the extracted representation mainly describes the human shape and is insufficient to describe motion information. Therefore, we first present a handcrafted motion feature called pose flow to guide the reconstruction of the autoencoder, whose midlevel feature is expected to describe motion information. The performance is limited as we observe that actions can be distinctive in either motion direction or motion norm. For example, we can distinguish “sitting down” and “standing up” from motion direction yet distinguish “running” and “jogging” from motion norm. In these cases, it is difficult to learn distinctive features from pose flow where direction and norm are mixed. To this end, we present an explicit pose decoupled flow network (PDF-E) to learn from direction and norm in a multi-task learning framework, where 1 encoder is used to generate representation and 2 decoders are used to generating direction and norm, respectively. Further, we use reconstructing the input pose sequence as an additional constraint and present a generalized PDF network (PDF-G) to learn both motion and shape information, which achieves state-of-the-art performances on large-scale and challenging 3D action recognition datasets including the NTU RGB+D 60 dataset and NTU RGB+D 120 dataset. |
format | Online Article Text |
id | pubmed-10076048 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | AAAS |
record_format | MEDLINE/PubMed |
spelling | pubmed-100760482023-04-06 Generalized Pose Decoupled Network for Unsupervised 3D Skeleton Sequence-Based Action Representation Learning Liu, Mengyuan Meng, Fanyang Liang, Yongsheng Cyborg Bionic Syst Research Article Human action representation is derived from the description of human shape and motion. The traditional unsupervised 3-dimensional (3D) human action representation learning method uses a recurrent neural network (RNN)-based autoencoder to reconstruct the input pose sequence and then takes the midlevel feature of the autoencoder as representation. Although RNN can implicitly learn a certain amount of motion information, the extracted representation mainly describes the human shape and is insufficient to describe motion information. Therefore, we first present a handcrafted motion feature called pose flow to guide the reconstruction of the autoencoder, whose midlevel feature is expected to describe motion information. The performance is limited as we observe that actions can be distinctive in either motion direction or motion norm. For example, we can distinguish “sitting down” and “standing up” from motion direction yet distinguish “running” and “jogging” from motion norm. In these cases, it is difficult to learn distinctive features from pose flow where direction and norm are mixed. To this end, we present an explicit pose decoupled flow network (PDF-E) to learn from direction and norm in a multi-task learning framework, where 1 encoder is used to generate representation and 2 decoders are used to generating direction and norm, respectively. Further, we use reconstructing the input pose sequence as an additional constraint and present a generalized PDF network (PDF-G) to learn both motion and shape information, which achieves state-of-the-art performances on large-scale and challenging 3D action recognition datasets including the NTU RGB+D 60 dataset and NTU RGB+D 120 dataset. AAAS 2022-12-30 2022 /pmc/articles/PMC10076048/ /pubmed/37040281 http://dx.doi.org/10.34133/cbsystems.0002 Text en Copyright © 2022 Mengyuan Liu et al. https://creativecommons.org/licenses/by/4.0/Exclusive licensee Beijing Institute of Technology Press. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution License 4.0 (CC BY 4.0) (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Research Article Liu, Mengyuan Meng, Fanyang Liang, Yongsheng Generalized Pose Decoupled Network for Unsupervised 3D Skeleton Sequence-Based Action Representation Learning |
title | Generalized Pose Decoupled Network for Unsupervised 3D Skeleton Sequence-Based Action Representation Learning |
title_full | Generalized Pose Decoupled Network for Unsupervised 3D Skeleton Sequence-Based Action Representation Learning |
title_fullStr | Generalized Pose Decoupled Network for Unsupervised 3D Skeleton Sequence-Based Action Representation Learning |
title_full_unstemmed | Generalized Pose Decoupled Network for Unsupervised 3D Skeleton Sequence-Based Action Representation Learning |
title_short | Generalized Pose Decoupled Network for Unsupervised 3D Skeleton Sequence-Based Action Representation Learning |
title_sort | generalized pose decoupled network for unsupervised 3d skeleton sequence-based action representation learning |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10076048/ https://www.ncbi.nlm.nih.gov/pubmed/37040281 http://dx.doi.org/10.34133/cbsystems.0002 |
work_keys_str_mv | AT liumengyuan generalizedposedecouplednetworkforunsupervised3dskeletonsequencebasedactionrepresentationlearning AT mengfanyang generalizedposedecouplednetworkforunsupervised3dskeletonsequencebasedactionrepresentationlearning AT liangyongsheng generalizedposedecouplednetworkforunsupervised3dskeletonsequencebasedactionrepresentationlearning |