Cargando…

CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement

In recent years, neural networks based on attention mechanisms have seen increasingly use in speech recognition, separation, and enhancement, as well as other fields. In particular, the convolution-augmented transformer has performed well, as it can combine the advantages of convolution and self-att...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Haozhe, Zhang, Xiaojuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10137386/
https://www.ncbi.nlm.nih.gov/pubmed/37190416
http://dx.doi.org/10.3390/e25040628
_version_ 1785032450508849152
author Chen, Haozhe
Zhang, Xiaojuan
author_facet Chen, Haozhe
Zhang, Xiaojuan
author_sort Chen, Haozhe
collection PubMed
description In recent years, neural networks based on attention mechanisms have seen increasingly use in speech recognition, separation, and enhancement, as well as other fields. In particular, the convolution-augmented transformer has performed well, as it can combine the advantages of convolution and self-attention. Recently, the gated attention unit (GAU) was proposed. Compared with traditional multi-head self-attention, approaches with GAU are effective and computationally efficient. In this CGA-MGAN: MetricGAN based on Convolution-augmented Gated Attention for Speech Enhancement, we propose a network for speech enhancement called CGA-MGAN, a kind of MetricGAN based on convolution-augmented gated attention. CGA-MGAN captures local and global correlations in speech signals at the same time by fusing convolution and gated attention units. Experiments on Voice Bank + DEMAND show that our proposed CGA-MGAN model achieves excellent performance (3.47 PESQ, 0.96 STOI, and 11.09 dB SSNR) with a relatively small model size (1.14 M).
format Online
Article
Text
id pubmed-10137386
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-101373862023-04-28 CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement Chen, Haozhe Zhang, Xiaojuan Entropy (Basel) Article In recent years, neural networks based on attention mechanisms have seen increasingly use in speech recognition, separation, and enhancement, as well as other fields. In particular, the convolution-augmented transformer has performed well, as it can combine the advantages of convolution and self-attention. Recently, the gated attention unit (GAU) was proposed. Compared with traditional multi-head self-attention, approaches with GAU are effective and computationally efficient. In this CGA-MGAN: MetricGAN based on Convolution-augmented Gated Attention for Speech Enhancement, we propose a network for speech enhancement called CGA-MGAN, a kind of MetricGAN based on convolution-augmented gated attention. CGA-MGAN captures local and global correlations in speech signals at the same time by fusing convolution and gated attention units. Experiments on Voice Bank + DEMAND show that our proposed CGA-MGAN model achieves excellent performance (3.47 PESQ, 0.96 STOI, and 11.09 dB SSNR) with a relatively small model size (1.14 M). MDPI 2023-04-06 /pmc/articles/PMC10137386/ /pubmed/37190416 http://dx.doi.org/10.3390/e25040628 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Chen, Haozhe
Zhang, Xiaojuan
CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement
title CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement
title_full CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement
title_fullStr CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement
title_full_unstemmed CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement
title_short CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement
title_sort cga-mgan: metric gan based on convolution-augmented gated attention for speech enhancement
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10137386/
https://www.ncbi.nlm.nih.gov/pubmed/37190416
http://dx.doi.org/10.3390/e25040628
work_keys_str_mv AT chenhaozhe cgamganmetricganbasedonconvolutionaugmentedgatedattentionforspeechenhancement
AT zhangxiaojuan cgamganmetricganbasedonconvolutionaugmentedgatedattentionforspeechenhancement