Cargando…

Multi-Task Transformer with Adaptive Cross-Entropy Loss for Multi-Dialect Speech Recognition

At present, most multi-dialect speech recognition models are based on a hard-parameter-sharing multi-task structure, which makes it difficult to reveal how one task contributes to others. In addition, in order to balance multi-task learning, the weights of the multi-task objective function need to b...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dan, Zhengjia, Zhao, Yue, Bi, Xiaojun, Wu, Licheng, Ji, Qiang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9601745/ https://www.ncbi.nlm.nih.gov/pubmed/37420449 http://dx.doi.org/10.3390/e24101429

_version_	1784817141931835392
author	Dan, Zhengjia Zhao, Yue Bi, Xiaojun Wu, Licheng Ji, Qiang
author_facet	Dan, Zhengjia Zhao, Yue Bi, Xiaojun Wu, Licheng Ji, Qiang
author_sort	Dan, Zhengjia
collection	PubMed
description	At present, most multi-dialect speech recognition models are based on a hard-parameter-sharing multi-task structure, which makes it difficult to reveal how one task contributes to others. In addition, in order to balance multi-task learning, the weights of the multi-task objective function need to be manually adjusted. This makes multi-task learning very difficult and costly because it requires constantly trying various combinations of weights to determine the optimal task weights. In this paper, we propose a multi-dialect acoustic model that combines soft-parameter-sharing multi-task learning with Transformer, and introduce several auxiliary cross-attentions to enable the auxiliary task (dialect ID recognition) to provide dialect information for the multi-dialect speech recognition task. Furthermore, we use the adaptive cross-entropy loss function as the multi-task objective function, which automatically balances the learning of the multi-task model according to the loss proportion of each task during the training process. Therefore, the optimal weight combination can be found without any manual intervention. Finally, for the two tasks of multi-dialect (including low-resource dialect) speech recognition and dialect ID recognition, the experimental results show that, compared with single-dialect Transformer, single-task multi-dialect Transformer, and multi-task Transformer with hard parameter sharing, our method significantly reduces the average syllable error rate of Tibetan multi-dialect speech recognition and the character error rate of Chinese multi-dialect speech recognition.
format	Online Article Text
id	pubmed-9601745
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-96017452022-10-27 Multi-Task Transformer with Adaptive Cross-Entropy Loss for Multi-Dialect Speech Recognition Dan, Zhengjia Zhao, Yue Bi, Xiaojun Wu, Licheng Ji, Qiang Entropy (Basel) Article At present, most multi-dialect speech recognition models are based on a hard-parameter-sharing multi-task structure, which makes it difficult to reveal how one task contributes to others. In addition, in order to balance multi-task learning, the weights of the multi-task objective function need to be manually adjusted. This makes multi-task learning very difficult and costly because it requires constantly trying various combinations of weights to determine the optimal task weights. In this paper, we propose a multi-dialect acoustic model that combines soft-parameter-sharing multi-task learning with Transformer, and introduce several auxiliary cross-attentions to enable the auxiliary task (dialect ID recognition) to provide dialect information for the multi-dialect speech recognition task. Furthermore, we use the adaptive cross-entropy loss function as the multi-task objective function, which automatically balances the learning of the multi-task model according to the loss proportion of each task during the training process. Therefore, the optimal weight combination can be found without any manual intervention. Finally, for the two tasks of multi-dialect (including low-resource dialect) speech recognition and dialect ID recognition, the experimental results show that, compared with single-dialect Transformer, single-task multi-dialect Transformer, and multi-task Transformer with hard parameter sharing, our method significantly reduces the average syllable error rate of Tibetan multi-dialect speech recognition and the character error rate of Chinese multi-dialect speech recognition. MDPI 2022-10-08 /pmc/articles/PMC9601745/ /pubmed/37420449 http://dx.doi.org/10.3390/e24101429 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Dan, Zhengjia Zhao, Yue Bi, Xiaojun Wu, Licheng Ji, Qiang Multi-Task Transformer with Adaptive Cross-Entropy Loss for Multi-Dialect Speech Recognition
title	Multi-Task Transformer with Adaptive Cross-Entropy Loss for Multi-Dialect Speech Recognition
title_full	Multi-Task Transformer with Adaptive Cross-Entropy Loss for Multi-Dialect Speech Recognition
title_fullStr	Multi-Task Transformer with Adaptive Cross-Entropy Loss for Multi-Dialect Speech Recognition
title_full_unstemmed	Multi-Task Transformer with Adaptive Cross-Entropy Loss for Multi-Dialect Speech Recognition
title_short	Multi-Task Transformer with Adaptive Cross-Entropy Loss for Multi-Dialect Speech Recognition
title_sort	multi-task transformer with adaptive cross-entropy loss for multi-dialect speech recognition
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9601745/ https://www.ncbi.nlm.nih.gov/pubmed/37420449 http://dx.doi.org/10.3390/e24101429
work_keys_str_mv	AT danzhengjia multitasktransformerwithadaptivecrossentropylossformultidialectspeechrecognition AT zhaoyue multitasktransformerwithadaptivecrossentropylossformultidialectspeechrecognition AT bixiaojun multitasktransformerwithadaptivecrossentropylossformultidialectspeechrecognition AT wulicheng multitasktransformerwithadaptivecrossentropylossformultidialectspeechrecognition AT jiqiang multitasktransformerwithadaptivecrossentropylossformultidialectspeechrecognition

Multi-Task Transformer with Adaptive Cross-Entropy Loss for Multi-Dialect Speech Recognition

Ejemplares similares