Cargando…
CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement
In recent years, neural networks based on attention mechanisms have seen increasingly use in speech recognition, separation, and enhancement, as well as other fields. In particular, the convolution-augmented transformer has performed well, as it can combine the advantages of convolution and self-att...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10137386/ https://www.ncbi.nlm.nih.gov/pubmed/37190416 http://dx.doi.org/10.3390/e25040628 |
_version_ | 1785032450508849152 |
---|---|
author | Chen, Haozhe Zhang, Xiaojuan |
author_facet | Chen, Haozhe Zhang, Xiaojuan |
author_sort | Chen, Haozhe |
collection | PubMed |
description | In recent years, neural networks based on attention mechanisms have seen increasingly use in speech recognition, separation, and enhancement, as well as other fields. In particular, the convolution-augmented transformer has performed well, as it can combine the advantages of convolution and self-attention. Recently, the gated attention unit (GAU) was proposed. Compared with traditional multi-head self-attention, approaches with GAU are effective and computationally efficient. In this CGA-MGAN: MetricGAN based on Convolution-augmented Gated Attention for Speech Enhancement, we propose a network for speech enhancement called CGA-MGAN, a kind of MetricGAN based on convolution-augmented gated attention. CGA-MGAN captures local and global correlations in speech signals at the same time by fusing convolution and gated attention units. Experiments on Voice Bank + DEMAND show that our proposed CGA-MGAN model achieves excellent performance (3.47 PESQ, 0.96 STOI, and 11.09 dB SSNR) with a relatively small model size (1.14 M). |
format | Online Article Text |
id | pubmed-10137386 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-101373862023-04-28 CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement Chen, Haozhe Zhang, Xiaojuan Entropy (Basel) Article In recent years, neural networks based on attention mechanisms have seen increasingly use in speech recognition, separation, and enhancement, as well as other fields. In particular, the convolution-augmented transformer has performed well, as it can combine the advantages of convolution and self-attention. Recently, the gated attention unit (GAU) was proposed. Compared with traditional multi-head self-attention, approaches with GAU are effective and computationally efficient. In this CGA-MGAN: MetricGAN based on Convolution-augmented Gated Attention for Speech Enhancement, we propose a network for speech enhancement called CGA-MGAN, a kind of MetricGAN based on convolution-augmented gated attention. CGA-MGAN captures local and global correlations in speech signals at the same time by fusing convolution and gated attention units. Experiments on Voice Bank + DEMAND show that our proposed CGA-MGAN model achieves excellent performance (3.47 PESQ, 0.96 STOI, and 11.09 dB SSNR) with a relatively small model size (1.14 M). MDPI 2023-04-06 /pmc/articles/PMC10137386/ /pubmed/37190416 http://dx.doi.org/10.3390/e25040628 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Chen, Haozhe Zhang, Xiaojuan CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement |
title | CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement |
title_full | CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement |
title_fullStr | CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement |
title_full_unstemmed | CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement |
title_short | CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement |
title_sort | cga-mgan: metric gan based on convolution-augmented gated attention for speech enhancement |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10137386/ https://www.ncbi.nlm.nih.gov/pubmed/37190416 http://dx.doi.org/10.3390/e25040628 |
work_keys_str_mv | AT chenhaozhe cgamganmetricganbasedonconvolutionaugmentedgatedattentionforspeechenhancement AT zhangxiaojuan cgamganmetricganbasedonconvolutionaugmentedgatedattentionforspeechenhancement |