Cargando…

One Model is Not Enough: Ensembles for Isolated Sign Language Recognition

In this paper, we dive into sign language recognition, focusing on the recognition of isolated signs. The task is defined as a classification problem, where a sequence of frames (i.e., images) is recognized as one of the given sign language glosses. We analyze two appearance-based approaches, I3D an...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hrúz, Marek, Gruber, Ivan, Kanis, Jakub, Boháček, Matyáš, Hlaváč, Miroslav, Krňoul, Zdeněk
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9269724/ https://www.ncbi.nlm.nih.gov/pubmed/35808537 http://dx.doi.org/10.3390/s22135043

Descripción
Sumario:	In this paper, we dive into sign language recognition, focusing on the recognition of isolated signs. The task is defined as a classification problem, where a sequence of frames (i.e., images) is recognized as one of the given sign language glosses. We analyze two appearance-based approaches, I3D and TimeSformer, and one pose-based approach, SPOTER. The appearance-based approaches are trained on a few different data modalities, whereas the performance of SPOTER is evaluated on different types of preprocessing. All the methods are tested on two publicly available datasets: AUTSL and WLASL300. We experiment with ensemble techniques to achieve new state-of-the-art results of 73.84% accuracy on the WLASL300 dataset by using the CMA-ES optimization method to find the best ensemble weight parameters. Furthermore, we present an ensembling technique based on the Transformer model, which we call Neural Ensembler.

One Model is Not Enough: Ensembles for Isolated Sign Language Recognition

Ejemplares similares