Cargando…

Speech Enhancement by Multiple Propagation through the Same Neural Network

Monaural speech enhancement aims to remove background noise from an audio recording containing speech in order to improve its clarity and intelligibility. Currently, the most successful solutions for speech enhancement use deep neural networks. In a typical setting, such neural networks process the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Grzywalski, Tomasz, Drgas, Szymon
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9003084/ https://www.ncbi.nlm.nih.gov/pubmed/35408056 http://dx.doi.org/10.3390/s22072440

_version_	1784686047398985728
author	Grzywalski, Tomasz Drgas, Szymon
author_facet	Grzywalski, Tomasz Drgas, Szymon
author_sort	Grzywalski, Tomasz
collection	PubMed
description	Monaural speech enhancement aims to remove background noise from an audio recording containing speech in order to improve its clarity and intelligibility. Currently, the most successful solutions for speech enhancement use deep neural networks. In a typical setting, such neural networks process the noisy input signal once and produces a single enhanced signal. However, it was recently shown that a U-Net-based network can be trained in such a way that allows it to process the same input signal multiple times in order to enhance the speech even further. Unfortunately, this was tested only for two-iteration enhancement. In the current research, we extend previous efforts and demonstrate how the multi-forward-pass speech enhancement can be successfully applied to other architectures, namely the ResBLSTM and Transformer-Net. Moreover, we test the three architectures with up to five iterations, thus identifying the method’s limit in terms of performance gain. In our experiments, we used the audio samples from the WSJ0, Noisex-92, and DCASE datasets and measured speech enhancement quality using SI-SDR, STOI, and PESQ. The results show that performing speech enhancement up to five times still brings improvements to speech intelligibility, but the gain becomes smaller with each iteration. Nevertheless, performing five iterations instead of two gives additional a 0.6 dB SI-SDR and four-percentage-point STOI gain. However, these increments are not equal between different architectures, and the U-Net and Transformer-Net benefit more from multi-forward pass compared to ResBLSTM.
format	Online Article Text
id	pubmed-9003084
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-90030842022-04-13 Speech Enhancement by Multiple Propagation through the Same Neural Network Grzywalski, Tomasz Drgas, Szymon Sensors (Basel) Article Monaural speech enhancement aims to remove background noise from an audio recording containing speech in order to improve its clarity and intelligibility. Currently, the most successful solutions for speech enhancement use deep neural networks. In a typical setting, such neural networks process the noisy input signal once and produces a single enhanced signal. However, it was recently shown that a U-Net-based network can be trained in such a way that allows it to process the same input signal multiple times in order to enhance the speech even further. Unfortunately, this was tested only for two-iteration enhancement. In the current research, we extend previous efforts and demonstrate how the multi-forward-pass speech enhancement can be successfully applied to other architectures, namely the ResBLSTM and Transformer-Net. Moreover, we test the three architectures with up to five iterations, thus identifying the method’s limit in terms of performance gain. In our experiments, we used the audio samples from the WSJ0, Noisex-92, and DCASE datasets and measured speech enhancement quality using SI-SDR, STOI, and PESQ. The results show that performing speech enhancement up to five times still brings improvements to speech intelligibility, but the gain becomes smaller with each iteration. Nevertheless, performing five iterations instead of two gives additional a 0.6 dB SI-SDR and four-percentage-point STOI gain. However, these increments are not equal between different architectures, and the U-Net and Transformer-Net benefit more from multi-forward pass compared to ResBLSTM. MDPI 2022-03-22 /pmc/articles/PMC9003084/ /pubmed/35408056 http://dx.doi.org/10.3390/s22072440 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Grzywalski, Tomasz Drgas, Szymon Speech Enhancement by Multiple Propagation through the Same Neural Network
title	Speech Enhancement by Multiple Propagation through the Same Neural Network
title_full	Speech Enhancement by Multiple Propagation through the Same Neural Network
title_fullStr	Speech Enhancement by Multiple Propagation through the Same Neural Network
title_full_unstemmed	Speech Enhancement by Multiple Propagation through the Same Neural Network
title_short	Speech Enhancement by Multiple Propagation through the Same Neural Network
title_sort	speech enhancement by multiple propagation through the same neural network
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9003084/ https://www.ncbi.nlm.nih.gov/pubmed/35408056 http://dx.doi.org/10.3390/s22072440
work_keys_str_mv	AT grzywalskitomasz speechenhancementbymultiplepropagationthroughthesameneuralnetwork AT drgasszymon speechenhancementbymultiplepropagationthroughthesameneuralnetwork

Speech Enhancement by Multiple Propagation through the Same Neural Network

Ejemplares similares