Cargando…

Automatic Speech Recognition Performance Improvement for Mandarin Based on Optimizing Gain Control Strategy

Automatic speech recognition (ASR) is an essential technique of human–computer interactions; gain control is a commonly used operation in ASR. However, inappropriate gain control strategies can lead to an increase in the word error rate (WER) of ASR. As there is a current lack of sufficient theoreti...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Desheng, Wei, Yangjie, Zhang, Ke, Ji, Dong, Wang, Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9027119/
https://www.ncbi.nlm.nih.gov/pubmed/35459013
http://dx.doi.org/10.3390/s22083027
_version_ 1784691281732042752
author Wang, Desheng
Wei, Yangjie
Zhang, Ke
Ji, Dong
Wang, Yi
author_facet Wang, Desheng
Wei, Yangjie
Zhang, Ke
Ji, Dong
Wang, Yi
author_sort Wang, Desheng
collection PubMed
description Automatic speech recognition (ASR) is an essential technique of human–computer interactions; gain control is a commonly used operation in ASR. However, inappropriate gain control strategies can lead to an increase in the word error rate (WER) of ASR. As there is a current lack of sufficient theoretical analyses and proof of the relationship between gain control and WER, various unconstrained gain control strategies have been adopted on realistic ASR systems, and the optimal gain control with respect to the lowest WER, is rarely achieved. A gain control strategy named maximized original signal transmission (MOST) is proposed in this study to minimize the adverse impact of gain control on ASR systems. First, by modeling the gain control strategy, the quantitative relationship between the gain control strategy and the ASR performance was established using the noise figure index. Second, through an analysis of the quantitative relationship, an optimal MOST gain control strategy with minimal performance degradation was theoretically deduced. Finally, comprehensive comparative experiments on a Mandarin dataset show that the proposed MOST gain control strategy can significantly reduce the WER of the experimental ASR system, with a 10% mean absolute WER reduction at −9 dB gain.
format Online
Article
Text
id pubmed-9027119
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-90271192022-04-23 Automatic Speech Recognition Performance Improvement for Mandarin Based on Optimizing Gain Control Strategy Wang, Desheng Wei, Yangjie Zhang, Ke Ji, Dong Wang, Yi Sensors (Basel) Article Automatic speech recognition (ASR) is an essential technique of human–computer interactions; gain control is a commonly used operation in ASR. However, inappropriate gain control strategies can lead to an increase in the word error rate (WER) of ASR. As there is a current lack of sufficient theoretical analyses and proof of the relationship between gain control and WER, various unconstrained gain control strategies have been adopted on realistic ASR systems, and the optimal gain control with respect to the lowest WER, is rarely achieved. A gain control strategy named maximized original signal transmission (MOST) is proposed in this study to minimize the adverse impact of gain control on ASR systems. First, by modeling the gain control strategy, the quantitative relationship between the gain control strategy and the ASR performance was established using the noise figure index. Second, through an analysis of the quantitative relationship, an optimal MOST gain control strategy with minimal performance degradation was theoretically deduced. Finally, comprehensive comparative experiments on a Mandarin dataset show that the proposed MOST gain control strategy can significantly reduce the WER of the experimental ASR system, with a 10% mean absolute WER reduction at −9 dB gain. MDPI 2022-04-15 /pmc/articles/PMC9027119/ /pubmed/35459013 http://dx.doi.org/10.3390/s22083027 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Wang, Desheng
Wei, Yangjie
Zhang, Ke
Ji, Dong
Wang, Yi
Automatic Speech Recognition Performance Improvement for Mandarin Based on Optimizing Gain Control Strategy
title Automatic Speech Recognition Performance Improvement for Mandarin Based on Optimizing Gain Control Strategy
title_full Automatic Speech Recognition Performance Improvement for Mandarin Based on Optimizing Gain Control Strategy
title_fullStr Automatic Speech Recognition Performance Improvement for Mandarin Based on Optimizing Gain Control Strategy
title_full_unstemmed Automatic Speech Recognition Performance Improvement for Mandarin Based on Optimizing Gain Control Strategy
title_short Automatic Speech Recognition Performance Improvement for Mandarin Based on Optimizing Gain Control Strategy
title_sort automatic speech recognition performance improvement for mandarin based on optimizing gain control strategy
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9027119/
https://www.ncbi.nlm.nih.gov/pubmed/35459013
http://dx.doi.org/10.3390/s22083027
work_keys_str_mv AT wangdesheng automaticspeechrecognitionperformanceimprovementformandarinbasedonoptimizinggaincontrolstrategy
AT weiyangjie automaticspeechrecognitionperformanceimprovementformandarinbasedonoptimizinggaincontrolstrategy
AT zhangke automaticspeechrecognitionperformanceimprovementformandarinbasedonoptimizinggaincontrolstrategy
AT jidong automaticspeechrecognitionperformanceimprovementformandarinbasedonoptimizinggaincontrolstrategy
AT wangyi automaticspeechrecognitionperformanceimprovementformandarinbasedonoptimizinggaincontrolstrategy