Cargando…
Skeleton-Based Spatio-Temporal U-Network for 3D Human Pose Estimation in Video
Despite the great progress in 3D pose estimation from videos, there is still a lack of effective means to extract spatio-temporal features of different granularity from complex dynamic skeleton sequences. To tackle this problem, we propose a novel, skeleton-based spatio-temporal U-Net(STUNet) scheme...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9003032/ https://www.ncbi.nlm.nih.gov/pubmed/35408188 http://dx.doi.org/10.3390/s22072573 |
_version_ | 1784686034329534464 |
---|---|
author | Li, Weiwei Du, Rong Chen, Shudong |
author_facet | Li, Weiwei Du, Rong Chen, Shudong |
author_sort | Li, Weiwei |
collection | PubMed |
description | Despite the great progress in 3D pose estimation from videos, there is still a lack of effective means to extract spatio-temporal features of different granularity from complex dynamic skeleton sequences. To tackle this problem, we propose a novel, skeleton-based spatio-temporal U-Net(STUNet) scheme to deal with spatio-temporal features in multiple scales for 3D human pose estimation in video. The proposed STUNet architecture consists of a cascade structure of semantic graph convolution layers and structural temporal dilated convolution layers, progressively extracting and fusing the spatio-temporal semantic features from fine-grained to coarse-grained. This U-shaped network achieves scale compression and feature squeezing by downscaling and upscaling, while abstracting multi-resolution spatio-temporal dependencies through skip connections. Experiments demonstrate that our model effectively captures comprehensive spatio-temporal features in multiple scales and achieves substantial improvements over mainstream methods on real-world datasets. |
format | Online Article Text |
id | pubmed-9003032 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-90030322022-04-13 Skeleton-Based Spatio-Temporal U-Network for 3D Human Pose Estimation in Video Li, Weiwei Du, Rong Chen, Shudong Sensors (Basel) Article Despite the great progress in 3D pose estimation from videos, there is still a lack of effective means to extract spatio-temporal features of different granularity from complex dynamic skeleton sequences. To tackle this problem, we propose a novel, skeleton-based spatio-temporal U-Net(STUNet) scheme to deal with spatio-temporal features in multiple scales for 3D human pose estimation in video. The proposed STUNet architecture consists of a cascade structure of semantic graph convolution layers and structural temporal dilated convolution layers, progressively extracting and fusing the spatio-temporal semantic features from fine-grained to coarse-grained. This U-shaped network achieves scale compression and feature squeezing by downscaling and upscaling, while abstracting multi-resolution spatio-temporal dependencies through skip connections. Experiments demonstrate that our model effectively captures comprehensive spatio-temporal features in multiple scales and achieves substantial improvements over mainstream methods on real-world datasets. MDPI 2022-03-28 /pmc/articles/PMC9003032/ /pubmed/35408188 http://dx.doi.org/10.3390/s22072573 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Li, Weiwei Du, Rong Chen, Shudong Skeleton-Based Spatio-Temporal U-Network for 3D Human Pose Estimation in Video |
title | Skeleton-Based Spatio-Temporal U-Network for 3D Human Pose Estimation in Video |
title_full | Skeleton-Based Spatio-Temporal U-Network for 3D Human Pose Estimation in Video |
title_fullStr | Skeleton-Based Spatio-Temporal U-Network for 3D Human Pose Estimation in Video |
title_full_unstemmed | Skeleton-Based Spatio-Temporal U-Network for 3D Human Pose Estimation in Video |
title_short | Skeleton-Based Spatio-Temporal U-Network for 3D Human Pose Estimation in Video |
title_sort | skeleton-based spatio-temporal u-network for 3d human pose estimation in video |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9003032/ https://www.ncbi.nlm.nih.gov/pubmed/35408188 http://dx.doi.org/10.3390/s22072573 |
work_keys_str_mv | AT liweiwei skeletonbasedspatiotemporalunetworkfor3dhumanposeestimationinvideo AT durong skeletonbasedspatiotemporalunetworkfor3dhumanposeestimationinvideo AT chenshudong skeletonbasedspatiotemporalunetworkfor3dhumanposeestimationinvideo |