Cargando…

A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production

As a typical sequence to sequence task, sign language production (SLP) aims to automatically translate spoken language sentences into the corresponding sign language sequences. The existing SLP methods can be classified into two categories: autoregressive and non-autoregressive SLP. The autoregressi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cui, Zhenchao, Chen, Ziang, Li, Zhaoxin, Wang, Zhaoqi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9785616/ https://www.ncbi.nlm.nih.gov/pubmed/36559975 http://dx.doi.org/10.3390/s22249606

_version_	1784858092030132224
author	Cui, Zhenchao Chen, Ziang Li, Zhaoxin Wang, Zhaoqi
author_facet	Cui, Zhenchao Chen, Ziang Li, Zhaoxin Wang, Zhaoqi
author_sort	Cui, Zhenchao
collection	PubMed
description	As a typical sequence to sequence task, sign language production (SLP) aims to automatically translate spoken language sentences into the corresponding sign language sequences. The existing SLP methods can be classified into two categories: autoregressive and non-autoregressive SLP. The autoregressive methods suffer from high latency and error accumulation caused by the long-term dependence between current output and the previous poses. And non-autoregressive methods suffer from repetition and omission during the parallel decoding process. To remedy these issues in SLP, we propose a novel method named Pyramid Semi-Autoregressive Transformer with Rich Semantics (PSAT-RS) in this paper. In PSAT-RS, we first introduce a pyramid Semi-Autoregressive mechanism with dividing target sequence into groups in a coarse-to-fine manner, which globally keeps the autoregressive property while locally generating target frames. Meanwhile, the relaxed masked attention mechanism is adopted to make the decoder not only capture the pose sequences in the previous groups, but also pay attention to the current group. Finally, considering the importance of spatial-temporal information, we also design a Rich Semantics embedding (RS) module to encode the sequential information both on time dimension and spatial displacement into the same high-dimensional space. This significantly improves the coordination of joints motion, making the generated sign language videos more natural. Results of our experiments conducted on RWTH-PHOENIX-Weather-2014T and CSL datasets show that the proposed PSAT-RS is competitive to the state-of-the-art autoregressive and non-autoregressive SLP models, achieving a better trade-off between speed and accuracy.
format	Online Article Text
id	pubmed-9785616
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-97856162022-12-24 A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production Cui, Zhenchao Chen, Ziang Li, Zhaoxin Wang, Zhaoqi Sensors (Basel) Article As a typical sequence to sequence task, sign language production (SLP) aims to automatically translate spoken language sentences into the corresponding sign language sequences. The existing SLP methods can be classified into two categories: autoregressive and non-autoregressive SLP. The autoregressive methods suffer from high latency and error accumulation caused by the long-term dependence between current output and the previous poses. And non-autoregressive methods suffer from repetition and omission during the parallel decoding process. To remedy these issues in SLP, we propose a novel method named Pyramid Semi-Autoregressive Transformer with Rich Semantics (PSAT-RS) in this paper. In PSAT-RS, we first introduce a pyramid Semi-Autoregressive mechanism with dividing target sequence into groups in a coarse-to-fine manner, which globally keeps the autoregressive property while locally generating target frames. Meanwhile, the relaxed masked attention mechanism is adopted to make the decoder not only capture the pose sequences in the previous groups, but also pay attention to the current group. Finally, considering the importance of spatial-temporal information, we also design a Rich Semantics embedding (RS) module to encode the sequential information both on time dimension and spatial displacement into the same high-dimensional space. This significantly improves the coordination of joints motion, making the generated sign language videos more natural. Results of our experiments conducted on RWTH-PHOENIX-Weather-2014T and CSL datasets show that the proposed PSAT-RS is competitive to the state-of-the-art autoregressive and non-autoregressive SLP models, achieving a better trade-off between speed and accuracy. MDPI 2022-12-08 /pmc/articles/PMC9785616/ /pubmed/36559975 http://dx.doi.org/10.3390/s22249606 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Cui, Zhenchao Chen, Ziang Li, Zhaoxin Wang, Zhaoqi A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production
title	A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production
title_full	A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production
title_fullStr	A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production
title_full_unstemmed	A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production
title_short	A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production
title_sort	pyramid semi-autoregressive transformer with rich semantics for sign language production
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9785616/ https://www.ncbi.nlm.nih.gov/pubmed/36559975 http://dx.doi.org/10.3390/s22249606
work_keys_str_mv	AT cuizhenchao apyramidsemiautoregressivetransformerwithrichsemanticsforsignlanguageproduction AT chenziang apyramidsemiautoregressivetransformerwithrichsemanticsforsignlanguageproduction AT lizhaoxin apyramidsemiautoregressivetransformerwithrichsemanticsforsignlanguageproduction AT wangzhaoqi apyramidsemiautoregressivetransformerwithrichsemanticsforsignlanguageproduction AT cuizhenchao pyramidsemiautoregressivetransformerwithrichsemanticsforsignlanguageproduction AT chenziang pyramidsemiautoregressivetransformerwithrichsemanticsforsignlanguageproduction AT lizhaoxin pyramidsemiautoregressivetransformerwithrichsemanticsforsignlanguageproduction AT wangzhaoqi pyramidsemiautoregressivetransformerwithrichsemanticsforsignlanguageproduction

A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production

Ejemplares similares