Cargando…

Multi-cue temporal modeling for skeleton-based sign language recognition

Sign languages are visual languages used as the primary communication medium for the Deaf community. The signs comprise manual and non-manual articulators such as hand shapes, upper body movement, and facial expressions. Sign Language Recognition (SLR) aims to learn spatial and temporal representati...

Descripción completa

Detalles Bibliográficos
Autores principales:	Özdemir, Oğulcan, Baytaş, İnci M., Akarun, Lale
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10113557/ https://www.ncbi.nlm.nih.gov/pubmed/37090797 http://dx.doi.org/10.3389/fnins.2023.1148191

_version_	1785027865593511936
author	Özdemir, Oğulcan Baytaş, İnci M. Akarun, Lale
author_facet	Özdemir, Oğulcan Baytaş, İnci M. Akarun, Lale
author_sort	Özdemir, Oğulcan
collection	PubMed
description	Sign languages are visual languages used as the primary communication medium for the Deaf community. The signs comprise manual and non-manual articulators such as hand shapes, upper body movement, and facial expressions. Sign Language Recognition (SLR) aims to learn spatial and temporal representations from the videos of the signs. Most SLR studies focus on manual features often extracted from the shape of the dominant hand or the entire frame. However, facial expressions combined with hand and body gestures may also play a significant role in discriminating the context represented in the sign videos. In this study, we propose an isolated SLR framework based on Spatial-Temporal Graph Convolutional Networks (ST-GCNs) and Multi-Cue Long Short-Term Memorys (MC-LSTMs) to exploit multi-articulatory (e.g., body, hands, and face) information for recognizing sign glosses. We train an ST-GCN model for learning representations from the upper body and hands. Meanwhile, spatial embeddings of hand shape and facial expression cues are extracted from Convolutional Neural Networks (CNNs) pre-trained on large-scale hand and facial expression datasets. Thus, the proposed framework coupling ST-GCNs with MC-LSTMs for multi-articulatory temporal modeling can provide insights into the contribution of each visual Sign Language (SL) cue to recognition performance. To evaluate the proposed framework, we conducted extensive analyzes on two Turkish SL benchmark datasets with different linguistic properties, BosphorusSign22k and AUTSL. While we obtained comparable recognition performance with the skeleton-based state-of-the-art, we observe that incorporating multiple visual SL cues improves the recognition performance, especially in certain sign classes where multi-cue information is vital. The code is available at: https://github.com/ogulcanozdemir/multicue-slr.
format	Online Article Text
id	pubmed-10113557
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-101135572023-04-20 Multi-cue temporal modeling for skeleton-based sign language recognition Özdemir, Oğulcan Baytaş, İnci M. Akarun, Lale Front Neurosci Neuroscience Sign languages are visual languages used as the primary communication medium for the Deaf community. The signs comprise manual and non-manual articulators such as hand shapes, upper body movement, and facial expressions. Sign Language Recognition (SLR) aims to learn spatial and temporal representations from the videos of the signs. Most SLR studies focus on manual features often extracted from the shape of the dominant hand or the entire frame. However, facial expressions combined with hand and body gestures may also play a significant role in discriminating the context represented in the sign videos. In this study, we propose an isolated SLR framework based on Spatial-Temporal Graph Convolutional Networks (ST-GCNs) and Multi-Cue Long Short-Term Memorys (MC-LSTMs) to exploit multi-articulatory (e.g., body, hands, and face) information for recognizing sign glosses. We train an ST-GCN model for learning representations from the upper body and hands. Meanwhile, spatial embeddings of hand shape and facial expression cues are extracted from Convolutional Neural Networks (CNNs) pre-trained on large-scale hand and facial expression datasets. Thus, the proposed framework coupling ST-GCNs with MC-LSTMs for multi-articulatory temporal modeling can provide insights into the contribution of each visual Sign Language (SL) cue to recognition performance. To evaluate the proposed framework, we conducted extensive analyzes on two Turkish SL benchmark datasets with different linguistic properties, BosphorusSign22k and AUTSL. While we obtained comparable recognition performance with the skeleton-based state-of-the-art, we observe that incorporating multiple visual SL cues improves the recognition performance, especially in certain sign classes where multi-cue information is vital. The code is available at: https://github.com/ogulcanozdemir/multicue-slr. Frontiers Media S.A. 2023-04-05 /pmc/articles/PMC10113557/ /pubmed/37090797 http://dx.doi.org/10.3389/fnins.2023.1148191 Text en Copyright © 2023 Özdemir, Baytaş and Akarun. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Özdemir, Oğulcan Baytaş, İnci M. Akarun, Lale Multi-cue temporal modeling for skeleton-based sign language recognition
title	Multi-cue temporal modeling for skeleton-based sign language recognition
title_full	Multi-cue temporal modeling for skeleton-based sign language recognition
title_fullStr	Multi-cue temporal modeling for skeleton-based sign language recognition
title_full_unstemmed	Multi-cue temporal modeling for skeleton-based sign language recognition
title_short	Multi-cue temporal modeling for skeleton-based sign language recognition
title_sort	multi-cue temporal modeling for skeleton-based sign language recognition
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10113557/ https://www.ncbi.nlm.nih.gov/pubmed/37090797 http://dx.doi.org/10.3389/fnins.2023.1148191
work_keys_str_mv	AT ozdemirogulcan multicuetemporalmodelingforskeletonbasedsignlanguagerecognition AT baytasincim multicuetemporalmodelingforskeletonbasedsignlanguagerecognition AT akarunlale multicuetemporalmodelingforskeletonbasedsignlanguagerecognition

Multi-cue temporal modeling for skeleton-based sign language recognition

Ejemplares similares