Cargando…
A study of transformer-based end-to-end speech recognition system for Kazakh language
Today, the Transformer model, which allows parallelization and also has its own internal attention, has been widely used in the field of speech recognition. The great advantage of this architecture is the fast learning speed, and the lack of sequential operation, as with recurrent neural networks. I...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9117202/ https://www.ncbi.nlm.nih.gov/pubmed/35585130 http://dx.doi.org/10.1038/s41598-022-12260-y |
_version_ | 1784710280988590080 |
---|---|
author | Orken, Mamyrbayev Dina, Oralbekova Keylan, Alimhan Tolganay, Turdalykyzy Mohamed, Othman |
author_facet | Orken, Mamyrbayev Dina, Oralbekova Keylan, Alimhan Tolganay, Turdalykyzy Mohamed, Othman |
author_sort | Orken, Mamyrbayev |
collection | PubMed |
description | Today, the Transformer model, which allows parallelization and also has its own internal attention, has been widely used in the field of speech recognition. The great advantage of this architecture is the fast learning speed, and the lack of sequential operation, as with recurrent neural networks. In this work, Transformer models and an end-to-end model based on connectionist temporal classification were considered to build a system for automatic recognition of Kazakh speech. It is known that Kazakh is part of a number of agglutinative languages and has limited data for implementing speech recognition systems. Some studies have shown that the Transformer model improves system performance for low-resource languages. Based on our experiments, it was revealed that the joint use of Transformer and connectionist temporal classification models contributed to improving the performance of the Kazakh speech recognition system and with an integrated language model it showed the best character error rate 3.7% on a clean dataset. |
format | Online Article Text |
id | pubmed-9117202 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-91172022022-05-20 A study of transformer-based end-to-end speech recognition system for Kazakh language Orken, Mamyrbayev Dina, Oralbekova Keylan, Alimhan Tolganay, Turdalykyzy Mohamed, Othman Sci Rep Article Today, the Transformer model, which allows parallelization and also has its own internal attention, has been widely used in the field of speech recognition. The great advantage of this architecture is the fast learning speed, and the lack of sequential operation, as with recurrent neural networks. In this work, Transformer models and an end-to-end model based on connectionist temporal classification were considered to build a system for automatic recognition of Kazakh speech. It is known that Kazakh is part of a number of agglutinative languages and has limited data for implementing speech recognition systems. Some studies have shown that the Transformer model improves system performance for low-resource languages. Based on our experiments, it was revealed that the joint use of Transformer and connectionist temporal classification models contributed to improving the performance of the Kazakh speech recognition system and with an integrated language model it showed the best character error rate 3.7% on a clean dataset. Nature Publishing Group UK 2022-05-18 /pmc/articles/PMC9117202/ /pubmed/35585130 http://dx.doi.org/10.1038/s41598-022-12260-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Orken, Mamyrbayev Dina, Oralbekova Keylan, Alimhan Tolganay, Turdalykyzy Mohamed, Othman A study of transformer-based end-to-end speech recognition system for Kazakh language |
title | A study of transformer-based end-to-end speech recognition system for Kazakh language |
title_full | A study of transformer-based end-to-end speech recognition system for Kazakh language |
title_fullStr | A study of transformer-based end-to-end speech recognition system for Kazakh language |
title_full_unstemmed | A study of transformer-based end-to-end speech recognition system for Kazakh language |
title_short | A study of transformer-based end-to-end speech recognition system for Kazakh language |
title_sort | study of transformer-based end-to-end speech recognition system for kazakh language |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9117202/ https://www.ncbi.nlm.nih.gov/pubmed/35585130 http://dx.doi.org/10.1038/s41598-022-12260-y |
work_keys_str_mv | AT orkenmamyrbayev astudyoftransformerbasedendtoendspeechrecognitionsystemforkazakhlanguage AT dinaoralbekova astudyoftransformerbasedendtoendspeechrecognitionsystemforkazakhlanguage AT keylanalimhan astudyoftransformerbasedendtoendspeechrecognitionsystemforkazakhlanguage AT tolganayturdalykyzy astudyoftransformerbasedendtoendspeechrecognitionsystemforkazakhlanguage AT mohamedothman astudyoftransformerbasedendtoendspeechrecognitionsystemforkazakhlanguage AT orkenmamyrbayev studyoftransformerbasedendtoendspeechrecognitionsystemforkazakhlanguage AT dinaoralbekova studyoftransformerbasedendtoendspeechrecognitionsystemforkazakhlanguage AT keylanalimhan studyoftransformerbasedendtoendspeechrecognitionsystemforkazakhlanguage AT tolganayturdalykyzy studyoftransformerbasedendtoendspeechrecognitionsystemforkazakhlanguage AT mohamedothman studyoftransformerbasedendtoendspeechrecognitionsystemforkazakhlanguage |