Cargando…

End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement

Because of their simple design structure, end-to-end deep learning (E2E-DL) models have gained a lot of attention for speech enhancement. A number of DL models have achieved excellent results in eliminating the background noise and enhancing the quality as well as the intelligibility of noisy speech...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ullah, Rizwan, Wuttisittikulkij, Lunchakorn, Chaudhary, Sushank, Parnianifard, Amir, Shah, Shashi, Ibrar, Muhammad, Wahab, Fazal-E
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9611713/ https://www.ncbi.nlm.nih.gov/pubmed/36298131 http://dx.doi.org/10.3390/s22207782

_version_	1784819597231259648
author	Ullah, Rizwan Wuttisittikulkij, Lunchakorn Chaudhary, Sushank Parnianifard, Amir Shah, Shashi Ibrar, Muhammad Wahab, Fazal-E
author_facet	Ullah, Rizwan Wuttisittikulkij, Lunchakorn Chaudhary, Sushank Parnianifard, Amir Shah, Shashi Ibrar, Muhammad Wahab, Fazal-E
author_sort	Ullah, Rizwan
collection	PubMed
description	Because of their simple design structure, end-to-end deep learning (E2E-DL) models have gained a lot of attention for speech enhancement. A number of DL models have achieved excellent results in eliminating the background noise and enhancing the quality as well as the intelligibility of noisy speech. Designing resource-efficient and compact models during real-time processing is still a key challenge. In order to enhance the accomplishment of E2E models, the sequential and local characteristics of speech signal should be efficiently taken into consideration while modeling. In this paper, we present resource-efficient and compact neural models for end-to-end noise-robust waveform-based speech enhancement. Combining the Convolutional Encode-Decoder (CED) and Recurrent Neural Networks (RNNs) in the Convolutional Recurrent Network (CRN) framework, we have aimed at different speech enhancement systems. Different noise types and speakers are used to train and test the proposed models. With LibriSpeech and the DEMAND dataset, the experiments show that the proposed models lead to improved quality and intelligibility with fewer trainable parameters, notably reduced model complexity, and inference time than existing recurrent and convolutional models. The quality and intelligibility are improved by 31.61% and 17.18% over the noisy speech. We further performed cross corpus analysis to demonstrate the generalization of the proposed E2E SE models across different speech datasets.
format	Online Article Text
id	pubmed-9611713
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-96117132022-10-28 End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement Ullah, Rizwan Wuttisittikulkij, Lunchakorn Chaudhary, Sushank Parnianifard, Amir Shah, Shashi Ibrar, Muhammad Wahab, Fazal-E Sensors (Basel) Article Because of their simple design structure, end-to-end deep learning (E2E-DL) models have gained a lot of attention for speech enhancement. A number of DL models have achieved excellent results in eliminating the background noise and enhancing the quality as well as the intelligibility of noisy speech. Designing resource-efficient and compact models during real-time processing is still a key challenge. In order to enhance the accomplishment of E2E models, the sequential and local characteristics of speech signal should be efficiently taken into consideration while modeling. In this paper, we present resource-efficient and compact neural models for end-to-end noise-robust waveform-based speech enhancement. Combining the Convolutional Encode-Decoder (CED) and Recurrent Neural Networks (RNNs) in the Convolutional Recurrent Network (CRN) framework, we have aimed at different speech enhancement systems. Different noise types and speakers are used to train and test the proposed models. With LibriSpeech and the DEMAND dataset, the experiments show that the proposed models lead to improved quality and intelligibility with fewer trainable parameters, notably reduced model complexity, and inference time than existing recurrent and convolutional models. The quality and intelligibility are improved by 31.61% and 17.18% over the noisy speech. We further performed cross corpus analysis to demonstrate the generalization of the proposed E2E SE models across different speech datasets. MDPI 2022-10-13 /pmc/articles/PMC9611713/ /pubmed/36298131 http://dx.doi.org/10.3390/s22207782 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Ullah, Rizwan Wuttisittikulkij, Lunchakorn Chaudhary, Sushank Parnianifard, Amir Shah, Shashi Ibrar, Muhammad Wahab, Fazal-E End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement
title	End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement
title_full	End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement
title_fullStr	End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement
title_full_unstemmed	End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement
title_short	End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement
title_sort	end-to-end deep convolutional recurrent models for noise robust waveform speech enhancement
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9611713/ https://www.ncbi.nlm.nih.gov/pubmed/36298131 http://dx.doi.org/10.3390/s22207782
work_keys_str_mv	AT ullahrizwan endtoenddeepconvolutionalrecurrentmodelsfornoiserobustwaveformspeechenhancement AT wuttisittikulkijlunchakorn endtoenddeepconvolutionalrecurrentmodelsfornoiserobustwaveformspeechenhancement AT chaudharysushank endtoenddeepconvolutionalrecurrentmodelsfornoiserobustwaveformspeechenhancement AT parnianifardamir endtoenddeepconvolutionalrecurrentmodelsfornoiserobustwaveformspeechenhancement AT shahshashi endtoenddeepconvolutionalrecurrentmodelsfornoiserobustwaveformspeechenhancement AT ibrarmuhammad endtoenddeepconvolutionalrecurrentmodelsfornoiserobustwaveformspeechenhancement AT wahabfazale endtoenddeepconvolutionalrecurrentmodelsfornoiserobustwaveformspeechenhancement

End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement

Ejemplares similares