Cargando…

Two-Stream Mixed Convolutional Neural Network for American Sign Language Recognition

The Convolutional Neural Network (CNN) has demonstrated excellent performance in image recognition and has brought new opportunities for sign language recognition. However, the features undergo many nonlinear transformations while performing the convolutional operation and the traditional CNN models...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Ying, Xu, Tianpei, Kim, Kangchul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9414785/
https://www.ncbi.nlm.nih.gov/pubmed/36015719
http://dx.doi.org/10.3390/s22165959
_version_ 1784776072752005120
author Ma, Ying
Xu, Tianpei
Kim, Kangchul
author_facet Ma, Ying
Xu, Tianpei
Kim, Kangchul
author_sort Ma, Ying
collection PubMed
description The Convolutional Neural Network (CNN) has demonstrated excellent performance in image recognition and has brought new opportunities for sign language recognition. However, the features undergo many nonlinear transformations while performing the convolutional operation and the traditional CNN models are insufficient in dealing with the correlation between images. In American Sign Language (ASL) recognition, J and Z with moving gestures bring recognition challenges. This paper proposes a novel Two-Stream Mixed (TSM) method with feature extraction and fusion operation to improve the correlation of feature expression between two time-consecutive images for the dynamic gestures. The proposed TSM-CNN system is composed of preprocessing, the TSM block, and CNN classifiers. Two consecutive images in the dynamic gesture are used as inputs of streams, and resizing, transformation, and augmentation are carried out in the preprocessing stage. The fusion feature map obtained by addition and concatenation in the TSM block is used as inputs of the classifiers. Finally, a classifier classifies images. The TSM-CNN model with the highest performance scores depending on three concatenation methods is selected as the definitive recognition model for ASL recognition. We design 4 CNN models with TSM: TSM-LeNet, TSM-AlexNet, TSM-ResNet18, and TSM-ResNet50. The experimental results show that the CNN models with the TSM are better than models without TSM. The TSM-ResNet50 has the best accuracy of 97.57% for MNIST and ASL datasets and is able to be applied to a RGB image sensing system for hearing-impaired people.
format Online
Article
Text
id pubmed-9414785
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-94147852022-08-27 Two-Stream Mixed Convolutional Neural Network for American Sign Language Recognition Ma, Ying Xu, Tianpei Kim, Kangchul Sensors (Basel) Article The Convolutional Neural Network (CNN) has demonstrated excellent performance in image recognition and has brought new opportunities for sign language recognition. However, the features undergo many nonlinear transformations while performing the convolutional operation and the traditional CNN models are insufficient in dealing with the correlation between images. In American Sign Language (ASL) recognition, J and Z with moving gestures bring recognition challenges. This paper proposes a novel Two-Stream Mixed (TSM) method with feature extraction and fusion operation to improve the correlation of feature expression between two time-consecutive images for the dynamic gestures. The proposed TSM-CNN system is composed of preprocessing, the TSM block, and CNN classifiers. Two consecutive images in the dynamic gesture are used as inputs of streams, and resizing, transformation, and augmentation are carried out in the preprocessing stage. The fusion feature map obtained by addition and concatenation in the TSM block is used as inputs of the classifiers. Finally, a classifier classifies images. The TSM-CNN model with the highest performance scores depending on three concatenation methods is selected as the definitive recognition model for ASL recognition. We design 4 CNN models with TSM: TSM-LeNet, TSM-AlexNet, TSM-ResNet18, and TSM-ResNet50. The experimental results show that the CNN models with the TSM are better than models without TSM. The TSM-ResNet50 has the best accuracy of 97.57% for MNIST and ASL datasets and is able to be applied to a RGB image sensing system for hearing-impaired people. MDPI 2022-08-09 /pmc/articles/PMC9414785/ /pubmed/36015719 http://dx.doi.org/10.3390/s22165959 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ma, Ying
Xu, Tianpei
Kim, Kangchul
Two-Stream Mixed Convolutional Neural Network for American Sign Language Recognition
title Two-Stream Mixed Convolutional Neural Network for American Sign Language Recognition
title_full Two-Stream Mixed Convolutional Neural Network for American Sign Language Recognition
title_fullStr Two-Stream Mixed Convolutional Neural Network for American Sign Language Recognition
title_full_unstemmed Two-Stream Mixed Convolutional Neural Network for American Sign Language Recognition
title_short Two-Stream Mixed Convolutional Neural Network for American Sign Language Recognition
title_sort two-stream mixed convolutional neural network for american sign language recognition
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9414785/
https://www.ncbi.nlm.nih.gov/pubmed/36015719
http://dx.doi.org/10.3390/s22165959
work_keys_str_mv AT maying twostreammixedconvolutionalneuralnetworkforamericansignlanguagerecognition
AT xutianpei twostreammixedconvolutionalneuralnetworkforamericansignlanguagerecognition
AT kimkangchul twostreammixedconvolutionalneuralnetworkforamericansignlanguagerecognition