Cargando…

End-to-End Automatic Pronunciation Error Detection Based on Improved Hybrid CTC/Attention Architecture

Advanced automatic pronunciation error detection (APED) algorithms are usually based on state-of-the-art automatic speech recognition (ASR) techniques. With the development of deep learning technology, end-to-end ASR technology has gradually matured and achieved positive practical results, which pro...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Long, Zhao, Ziping, Ma, Chunmei, Shan, Linlin, Sun, Huazhi, Jiang, Lifen, Deng, Shiwen, Gao, Chang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7180994/ https://www.ncbi.nlm.nih.gov/pubmed/32218379 http://dx.doi.org/10.3390/s20071809

_version_	1783525949994696704
author	Zhang, Long Zhao, Ziping Ma, Chunmei Shan, Linlin Sun, Huazhi Jiang, Lifen Deng, Shiwen Gao, Chang
author_facet	Zhang, Long Zhao, Ziping Ma, Chunmei Shan, Linlin Sun, Huazhi Jiang, Lifen Deng, Shiwen Gao, Chang
author_sort	Zhang, Long
collection	PubMed
description	Advanced automatic pronunciation error detection (APED) algorithms are usually based on state-of-the-art automatic speech recognition (ASR) techniques. With the development of deep learning technology, end-to-end ASR technology has gradually matured and achieved positive practical results, which provides us with a new opportunity to update the APED algorithm. We first constructed an end-to-end ASR system based on the hybrid connectionist temporal classification and attention (CTC/attention) architecture. An adaptive parameter was used to enhance the complementarity of the connectionist temporal classification (CTC) model and the attention-based seq2seq model, further improving the performance of the ASR system. After this, the improved ASR system was used in the APED task of Mandarin, and good results were obtained. This new APED method makes force alignment and segmentation unnecessary, and it does not require multiple complex models, such as an acoustic model or a language model. It is convenient and straightforward, and will be a suitable general solution for L1-independent computer-assisted pronunciation training (CAPT). Furthermore, we find that in regards to accuracy metrics, our proposed system based on the improved hybrid CTC/attention architecture is close to the state-of-the-art ASR system based on the deep neural network–deep neural network (DNN–DNN) architecture, and has a stronger effect on the F-measure metrics, which are especially suitable for the requirements of the APED task.
format	Online Article Text
id	pubmed-7180994
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-71809942020-04-30 End-to-End Automatic Pronunciation Error Detection Based on Improved Hybrid CTC/Attention Architecture Zhang, Long Zhao, Ziping Ma, Chunmei Shan, Linlin Sun, Huazhi Jiang, Lifen Deng, Shiwen Gao, Chang Sensors (Basel) Article Advanced automatic pronunciation error detection (APED) algorithms are usually based on state-of-the-art automatic speech recognition (ASR) techniques. With the development of deep learning technology, end-to-end ASR technology has gradually matured and achieved positive practical results, which provides us with a new opportunity to update the APED algorithm. We first constructed an end-to-end ASR system based on the hybrid connectionist temporal classification and attention (CTC/attention) architecture. An adaptive parameter was used to enhance the complementarity of the connectionist temporal classification (CTC) model and the attention-based seq2seq model, further improving the performance of the ASR system. After this, the improved ASR system was used in the APED task of Mandarin, and good results were obtained. This new APED method makes force alignment and segmentation unnecessary, and it does not require multiple complex models, such as an acoustic model or a language model. It is convenient and straightforward, and will be a suitable general solution for L1-independent computer-assisted pronunciation training (CAPT). Furthermore, we find that in regards to accuracy metrics, our proposed system based on the improved hybrid CTC/attention architecture is close to the state-of-the-art ASR system based on the deep neural network–deep neural network (DNN–DNN) architecture, and has a stronger effect on the F-measure metrics, which are especially suitable for the requirements of the APED task. MDPI 2020-03-25 /pmc/articles/PMC7180994/ /pubmed/32218379 http://dx.doi.org/10.3390/s20071809 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Zhang, Long Zhao, Ziping Ma, Chunmei Shan, Linlin Sun, Huazhi Jiang, Lifen Deng, Shiwen Gao, Chang End-to-End Automatic Pronunciation Error Detection Based on Improved Hybrid CTC/Attention Architecture
title	End-to-End Automatic Pronunciation Error Detection Based on Improved Hybrid CTC/Attention Architecture
title_full	End-to-End Automatic Pronunciation Error Detection Based on Improved Hybrid CTC/Attention Architecture
title_fullStr	End-to-End Automatic Pronunciation Error Detection Based on Improved Hybrid CTC/Attention Architecture
title_full_unstemmed	End-to-End Automatic Pronunciation Error Detection Based on Improved Hybrid CTC/Attention Architecture
title_short	End-to-End Automatic Pronunciation Error Detection Based on Improved Hybrid CTC/Attention Architecture
title_sort	end-to-end automatic pronunciation error detection based on improved hybrid ctc/attention architecture
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7180994/ https://www.ncbi.nlm.nih.gov/pubmed/32218379 http://dx.doi.org/10.3390/s20071809
work_keys_str_mv	AT zhanglong endtoendautomaticpronunciationerrordetectionbasedonimprovedhybridctcattentionarchitecture AT zhaoziping endtoendautomaticpronunciationerrordetectionbasedonimprovedhybridctcattentionarchitecture AT machunmei endtoendautomaticpronunciationerrordetectionbasedonimprovedhybridctcattentionarchitecture AT shanlinlin endtoendautomaticpronunciationerrordetectionbasedonimprovedhybridctcattentionarchitecture AT sunhuazhi endtoendautomaticpronunciationerrordetectionbasedonimprovedhybridctcattentionarchitecture AT jianglifen endtoendautomaticpronunciationerrordetectionbasedonimprovedhybridctcattentionarchitecture AT dengshiwen endtoendautomaticpronunciationerrordetectionbasedonimprovedhybridctcattentionarchitecture AT gaochang endtoendautomaticpronunciationerrordetectionbasedonimprovedhybridctcattentionarchitecture

End-to-End Automatic Pronunciation Error Detection Based on Improved Hybrid CTC/Attention Architecture

Ejemplares similares