Cargando…

CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement

In recent years, neural networks based on attention mechanisms have seen increasingly use in speech recognition, separation, and enhancement, as well as other fields. In particular, the convolution-augmented transformer has performed well, as it can combine the advantages of convolution and self-att...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Haozhe, Zhang, Xiaojuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10137386/
https://www.ncbi.nlm.nih.gov/pubmed/37190416
http://dx.doi.org/10.3390/e25040628
Descripción
Sumario:In recent years, neural networks based on attention mechanisms have seen increasingly use in speech recognition, separation, and enhancement, as well as other fields. In particular, the convolution-augmented transformer has performed well, as it can combine the advantages of convolution and self-attention. Recently, the gated attention unit (GAU) was proposed. Compared with traditional multi-head self-attention, approaches with GAU are effective and computationally efficient. In this CGA-MGAN: MetricGAN based on Convolution-augmented Gated Attention for Speech Enhancement, we propose a network for speech enhancement called CGA-MGAN, a kind of MetricGAN based on convolution-augmented gated attention. CGA-MGAN captures local and global correlations in speech signals at the same time by fusing convolution and gated attention units. Experiments on Voice Bank + DEMAND show that our proposed CGA-MGAN model achieves excellent performance (3.47 PESQ, 0.96 STOI, and 11.09 dB SSNR) with a relatively small model size (1.14 M).