Cargando…

An Improvement to Conformer-Based Model for High-Accuracy Speech Feature Extraction and Learning

Owing to the loss of effective information and incomplete feature extraction caused by the convolution and pooling operations in a convolution subsampling network, the accuracy and speed of current speech processing architectures based on the conformer model are influenced because the shallow featur...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liu, Mengzhuo, Wei, Yangjie
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9324068/ https://www.ncbi.nlm.nih.gov/pubmed/35885089 http://dx.doi.org/10.3390/e24070866

_version_	1784756716446941184
author	Liu, Mengzhuo Wei, Yangjie
author_facet	Liu, Mengzhuo Wei, Yangjie
author_sort	Liu, Mengzhuo
collection	PubMed
description	Owing to the loss of effective information and incomplete feature extraction caused by the convolution and pooling operations in a convolution subsampling network, the accuracy and speed of current speech processing architectures based on the conformer model are influenced because the shallow features of speech signals are not completely extracted. To solve these problems, in this study, we researched a method that used a capsule network to improve the accuracy of feature extraction in a conformer-based model, and then, we proposed a new end-to-end model architecture for speech recognition. First, to improve the accuracy of speech feature extraction, a capsule network with a dynamic routing mechanism was introduced into the conformer model; thus, the structural information in speech was preserved, and it was input to the conformer blocks via sequestered vectors; the learning ability of the conformed-based model was significantly enhanced using dynamic weight updating. Second, a residual network was added to the capsule blocks, thus, the mapping ability of our model was improved and the training difficulty was reduced. Furthermore, the bi-transformer model was adopted in the decoding network to promote the consistency of the hypotheses in different directions through bidirectional modeling. Finally, the effectiveness and robustness of the proposed model were verified against different types of recognition models by performing multiple sets of experiments. The experimental results demonstrated that our speech recognition model achieved a lower word error rate without a language model because of the higher accuracy of speech feature extraction and learning using our model architecture with a capsule network. Furthermore, our model architecture benefited from the advantage of the capsule network and the conformer encoder, and also has potential for other speech-related applications.
format	Online Article Text
id	pubmed-9324068
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-93240682022-07-27 An Improvement to Conformer-Based Model for High-Accuracy Speech Feature Extraction and Learning Liu, Mengzhuo Wei, Yangjie Entropy (Basel) Article Owing to the loss of effective information and incomplete feature extraction caused by the convolution and pooling operations in a convolution subsampling network, the accuracy and speed of current speech processing architectures based on the conformer model are influenced because the shallow features of speech signals are not completely extracted. To solve these problems, in this study, we researched a method that used a capsule network to improve the accuracy of feature extraction in a conformer-based model, and then, we proposed a new end-to-end model architecture for speech recognition. First, to improve the accuracy of speech feature extraction, a capsule network with a dynamic routing mechanism was introduced into the conformer model; thus, the structural information in speech was preserved, and it was input to the conformer blocks via sequestered vectors; the learning ability of the conformed-based model was significantly enhanced using dynamic weight updating. Second, a residual network was added to the capsule blocks, thus, the mapping ability of our model was improved and the training difficulty was reduced. Furthermore, the bi-transformer model was adopted in the decoding network to promote the consistency of the hypotheses in different directions through bidirectional modeling. Finally, the effectiveness and robustness of the proposed model were verified against different types of recognition models by performing multiple sets of experiments. The experimental results demonstrated that our speech recognition model achieved a lower word error rate without a language model because of the higher accuracy of speech feature extraction and learning using our model architecture with a capsule network. Furthermore, our model architecture benefited from the advantage of the capsule network and the conformer encoder, and also has potential for other speech-related applications. MDPI 2022-06-23 /pmc/articles/PMC9324068/ /pubmed/35885089 http://dx.doi.org/10.3390/e24070866 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Liu, Mengzhuo Wei, Yangjie An Improvement to Conformer-Based Model for High-Accuracy Speech Feature Extraction and Learning
title	An Improvement to Conformer-Based Model for High-Accuracy Speech Feature Extraction and Learning
title_full	An Improvement to Conformer-Based Model for High-Accuracy Speech Feature Extraction and Learning
title_fullStr	An Improvement to Conformer-Based Model for High-Accuracy Speech Feature Extraction and Learning
title_full_unstemmed	An Improvement to Conformer-Based Model for High-Accuracy Speech Feature Extraction and Learning
title_short	An Improvement to Conformer-Based Model for High-Accuracy Speech Feature Extraction and Learning
title_sort	improvement to conformer-based model for high-accuracy speech feature extraction and learning
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9324068/ https://www.ncbi.nlm.nih.gov/pubmed/35885089 http://dx.doi.org/10.3390/e24070866
work_keys_str_mv	AT liumengzhuo animprovementtoconformerbasedmodelforhighaccuracyspeechfeatureextractionandlearning AT weiyangjie animprovementtoconformerbasedmodelforhighaccuracyspeechfeatureextractionandlearning AT liumengzhuo improvementtoconformerbasedmodelforhighaccuracyspeechfeatureextractionandlearning AT weiyangjie improvementtoconformerbasedmodelforhighaccuracyspeechfeatureextractionandlearning

An Improvement to Conformer-Based Model for High-Accuracy Speech Feature Extraction and Learning

Ejemplares similares