Cargando…

A study of transformer-based end-to-end speech recognition system for Kazakh language

Today, the Transformer model, which allows parallelization and also has its own internal attention, has been widely used in the field of speech recognition. The great advantage of this architecture is the fast learning speed, and the lack of sequential operation, as with recurrent neural networks. I...

Descripción completa

Detalles Bibliográficos
Autores principales: Orken, Mamyrbayev, Dina, Oralbekova, Keylan, Alimhan, Tolganay, Turdalykyzy, Mohamed, Othman
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9117202/
https://www.ncbi.nlm.nih.gov/pubmed/35585130
http://dx.doi.org/10.1038/s41598-022-12260-y
_version_ 1784710280988590080
author Orken, Mamyrbayev
Dina, Oralbekova
Keylan, Alimhan
Tolganay, Turdalykyzy
Mohamed, Othman
author_facet Orken, Mamyrbayev
Dina, Oralbekova
Keylan, Alimhan
Tolganay, Turdalykyzy
Mohamed, Othman
author_sort Orken, Mamyrbayev
collection PubMed
description Today, the Transformer model, which allows parallelization and also has its own internal attention, has been widely used in the field of speech recognition. The great advantage of this architecture is the fast learning speed, and the lack of sequential operation, as with recurrent neural networks. In this work, Transformer models and an end-to-end model based on connectionist temporal classification were considered to build a system for automatic recognition of Kazakh speech. It is known that Kazakh is part of a number of agglutinative languages and has limited data for implementing speech recognition systems. Some studies have shown that the Transformer model improves system performance for low-resource languages. Based on our experiments, it was revealed that the joint use of Transformer and connectionist temporal classification models contributed to improving the performance of the Kazakh speech recognition system and with an integrated language model it showed the best character error rate 3.7% on a clean dataset.
format Online
Article
Text
id pubmed-9117202
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-91172022022-05-20 A study of transformer-based end-to-end speech recognition system for Kazakh language Orken, Mamyrbayev Dina, Oralbekova Keylan, Alimhan Tolganay, Turdalykyzy Mohamed, Othman Sci Rep Article Today, the Transformer model, which allows parallelization and also has its own internal attention, has been widely used in the field of speech recognition. The great advantage of this architecture is the fast learning speed, and the lack of sequential operation, as with recurrent neural networks. In this work, Transformer models and an end-to-end model based on connectionist temporal classification were considered to build a system for automatic recognition of Kazakh speech. It is known that Kazakh is part of a number of agglutinative languages and has limited data for implementing speech recognition systems. Some studies have shown that the Transformer model improves system performance for low-resource languages. Based on our experiments, it was revealed that the joint use of Transformer and connectionist temporal classification models contributed to improving the performance of the Kazakh speech recognition system and with an integrated language model it showed the best character error rate 3.7% on a clean dataset. Nature Publishing Group UK 2022-05-18 /pmc/articles/PMC9117202/ /pubmed/35585130 http://dx.doi.org/10.1038/s41598-022-12260-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Orken, Mamyrbayev
Dina, Oralbekova
Keylan, Alimhan
Tolganay, Turdalykyzy
Mohamed, Othman
A study of transformer-based end-to-end speech recognition system for Kazakh language
title A study of transformer-based end-to-end speech recognition system for Kazakh language
title_full A study of transformer-based end-to-end speech recognition system for Kazakh language
title_fullStr A study of transformer-based end-to-end speech recognition system for Kazakh language
title_full_unstemmed A study of transformer-based end-to-end speech recognition system for Kazakh language
title_short A study of transformer-based end-to-end speech recognition system for Kazakh language
title_sort study of transformer-based end-to-end speech recognition system for kazakh language
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9117202/
https://www.ncbi.nlm.nih.gov/pubmed/35585130
http://dx.doi.org/10.1038/s41598-022-12260-y
work_keys_str_mv AT orkenmamyrbayev astudyoftransformerbasedendtoendspeechrecognitionsystemforkazakhlanguage
AT dinaoralbekova astudyoftransformerbasedendtoendspeechrecognitionsystemforkazakhlanguage
AT keylanalimhan astudyoftransformerbasedendtoendspeechrecognitionsystemforkazakhlanguage
AT tolganayturdalykyzy astudyoftransformerbasedendtoendspeechrecognitionsystemforkazakhlanguage
AT mohamedothman astudyoftransformerbasedendtoendspeechrecognitionsystemforkazakhlanguage
AT orkenmamyrbayev studyoftransformerbasedendtoendspeechrecognitionsystemforkazakhlanguage
AT dinaoralbekova studyoftransformerbasedendtoendspeechrecognitionsystemforkazakhlanguage
AT keylanalimhan studyoftransformerbasedendtoendspeechrecognitionsystemforkazakhlanguage
AT tolganayturdalykyzy studyoftransformerbasedendtoendspeechrecognitionsystemforkazakhlanguage
AT mohamedothman studyoftransformerbasedendtoendspeechrecognitionsystemforkazakhlanguage