Cargando…

Skeleton-Based Spatio-Temporal U-Network for 3D Human Pose Estimation in Video

Despite the great progress in 3D pose estimation from videos, there is still a lack of effective means to extract spatio-temporal features of different granularity from complex dynamic skeleton sequences. To tackle this problem, we propose a novel, skeleton-based spatio-temporal U-Net(STUNet) scheme...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Weiwei, Du, Rong, Chen, Shudong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9003032/
https://www.ncbi.nlm.nih.gov/pubmed/35408188
http://dx.doi.org/10.3390/s22072573
_version_ 1784686034329534464
author Li, Weiwei
Du, Rong
Chen, Shudong
author_facet Li, Weiwei
Du, Rong
Chen, Shudong
author_sort Li, Weiwei
collection PubMed
description Despite the great progress in 3D pose estimation from videos, there is still a lack of effective means to extract spatio-temporal features of different granularity from complex dynamic skeleton sequences. To tackle this problem, we propose a novel, skeleton-based spatio-temporal U-Net(STUNet) scheme to deal with spatio-temporal features in multiple scales for 3D human pose estimation in video. The proposed STUNet architecture consists of a cascade structure of semantic graph convolution layers and structural temporal dilated convolution layers, progressively extracting and fusing the spatio-temporal semantic features from fine-grained to coarse-grained. This U-shaped network achieves scale compression and feature squeezing by downscaling and upscaling, while abstracting multi-resolution spatio-temporal dependencies through skip connections. Experiments demonstrate that our model effectively captures comprehensive spatio-temporal features in multiple scales and achieves substantial improvements over mainstream methods on real-world datasets.
format Online
Article
Text
id pubmed-9003032
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-90030322022-04-13 Skeleton-Based Spatio-Temporal U-Network for 3D Human Pose Estimation in Video Li, Weiwei Du, Rong Chen, Shudong Sensors (Basel) Article Despite the great progress in 3D pose estimation from videos, there is still a lack of effective means to extract spatio-temporal features of different granularity from complex dynamic skeleton sequences. To tackle this problem, we propose a novel, skeleton-based spatio-temporal U-Net(STUNet) scheme to deal with spatio-temporal features in multiple scales for 3D human pose estimation in video. The proposed STUNet architecture consists of a cascade structure of semantic graph convolution layers and structural temporal dilated convolution layers, progressively extracting and fusing the spatio-temporal semantic features from fine-grained to coarse-grained. This U-shaped network achieves scale compression and feature squeezing by downscaling and upscaling, while abstracting multi-resolution spatio-temporal dependencies through skip connections. Experiments demonstrate that our model effectively captures comprehensive spatio-temporal features in multiple scales and achieves substantial improvements over mainstream methods on real-world datasets. MDPI 2022-03-28 /pmc/articles/PMC9003032/ /pubmed/35408188 http://dx.doi.org/10.3390/s22072573 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Li, Weiwei
Du, Rong
Chen, Shudong
Skeleton-Based Spatio-Temporal U-Network for 3D Human Pose Estimation in Video
title Skeleton-Based Spatio-Temporal U-Network for 3D Human Pose Estimation in Video
title_full Skeleton-Based Spatio-Temporal U-Network for 3D Human Pose Estimation in Video
title_fullStr Skeleton-Based Spatio-Temporal U-Network for 3D Human Pose Estimation in Video
title_full_unstemmed Skeleton-Based Spatio-Temporal U-Network for 3D Human Pose Estimation in Video
title_short Skeleton-Based Spatio-Temporal U-Network for 3D Human Pose Estimation in Video
title_sort skeleton-based spatio-temporal u-network for 3d human pose estimation in video
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9003032/
https://www.ncbi.nlm.nih.gov/pubmed/35408188
http://dx.doi.org/10.3390/s22072573
work_keys_str_mv AT liweiwei skeletonbasedspatiotemporalunetworkfor3dhumanposeestimationinvideo
AT durong skeletonbasedspatiotemporalunetworkfor3dhumanposeestimationinvideo
AT chenshudong skeletonbasedspatiotemporalunetworkfor3dhumanposeestimationinvideo