Cargando…

Causal speech enhancement using dynamical-weighted loss and attention encoder-decoder recurrent neural network

Speech enhancement (SE) reduces background noise signals in target speech and is applied at the front end in various real-world applications, including robust ASRs and real-time processing in mobile phone communications. SE systems are commonly integrated into mobile phones to increase quality and i...

Descripción completa

Detalles Bibliográficos
Autores principales:	Peracha, Fahad Khalil, Khattak, Muhammad Irfan, Salem, Nema, Saleem, Nasir
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10174555/ https://www.ncbi.nlm.nih.gov/pubmed/37167227 http://dx.doi.org/10.1371/journal.pone.0285629

_version_	1785040057929826304
author	Peracha, Fahad Khalil Khattak, Muhammad Irfan Salem, Nema Saleem, Nasir
author_facet	Peracha, Fahad Khalil Khattak, Muhammad Irfan Salem, Nema Saleem, Nasir
author_sort	Peracha, Fahad Khalil
collection	PubMed
description	Speech enhancement (SE) reduces background noise signals in target speech and is applied at the front end in various real-world applications, including robust ASRs and real-time processing in mobile phone communications. SE systems are commonly integrated into mobile phones to increase quality and intelligibility. As a result, a low-latency system is required to operate in real-world applications. On the other hand, these systems need efficient optimization. This research focuses on the single-microphone SE operating in real-time systems with better optimization. We propose a causal data-driven model that uses attention encoder-decoder long short-term memory (LSTM) to estimate the time-frequency mask from a noisy speech in order to make a clean speech for real-time applications that need low-latency causal processing. The encoder-decoder LSTM and a causal attention mechanism are used in the proposed model. Furthermore, a dynamical-weighted (DW) loss function is proposed to improve model learning by varying the weight loss values. Experiments demonstrated that the proposed model consistently improves voice quality, intelligibility, and noise suppression. In the causal processing mode, the LSTM-based estimated suppression time-frequency mask outperforms the baseline model for unseen noise types. The proposed SE improved the STOI by 2.64% (baseline LSTM-IRM), 6.6% (LSTM-KF), 4.18% (DeepXi-KF), and 3.58% (DeepResGRU-KF). In addition, we examine word error rates (WERs) using Google’s Automatic Speech Recognition (ASR). The ASR results show that error rates decreased from 46.33% (noisy signals) to 13.11% (proposed) 15.73% (LSTM), and 14.97% (LSTM-KF).
format	Online Article Text
id	pubmed-10174555
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-101745552023-05-12 Causal speech enhancement using dynamical-weighted loss and attention encoder-decoder recurrent neural network Peracha, Fahad Khalil Khattak, Muhammad Irfan Salem, Nema Saleem, Nasir PLoS One Research Article Speech enhancement (SE) reduces background noise signals in target speech and is applied at the front end in various real-world applications, including robust ASRs and real-time processing in mobile phone communications. SE systems are commonly integrated into mobile phones to increase quality and intelligibility. As a result, a low-latency system is required to operate in real-world applications. On the other hand, these systems need efficient optimization. This research focuses on the single-microphone SE operating in real-time systems with better optimization. We propose a causal data-driven model that uses attention encoder-decoder long short-term memory (LSTM) to estimate the time-frequency mask from a noisy speech in order to make a clean speech for real-time applications that need low-latency causal processing. The encoder-decoder LSTM and a causal attention mechanism are used in the proposed model. Furthermore, a dynamical-weighted (DW) loss function is proposed to improve model learning by varying the weight loss values. Experiments demonstrated that the proposed model consistently improves voice quality, intelligibility, and noise suppression. In the causal processing mode, the LSTM-based estimated suppression time-frequency mask outperforms the baseline model for unseen noise types. The proposed SE improved the STOI by 2.64% (baseline LSTM-IRM), 6.6% (LSTM-KF), 4.18% (DeepXi-KF), and 3.58% (DeepResGRU-KF). In addition, we examine word error rates (WERs) using Google’s Automatic Speech Recognition (ASR). The ASR results show that error rates decreased from 46.33% (noisy signals) to 13.11% (proposed) 15.73% (LSTM), and 14.97% (LSTM-KF). Public Library of Science 2023-05-11 /pmc/articles/PMC10174555/ /pubmed/37167227 http://dx.doi.org/10.1371/journal.pone.0285629 Text en © 2023 Peracha et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Peracha, Fahad Khalil Khattak, Muhammad Irfan Salem, Nema Saleem, Nasir Causal speech enhancement using dynamical-weighted loss and attention encoder-decoder recurrent neural network
title	Causal speech enhancement using dynamical-weighted loss and attention encoder-decoder recurrent neural network
title_full	Causal speech enhancement using dynamical-weighted loss and attention encoder-decoder recurrent neural network
title_fullStr	Causal speech enhancement using dynamical-weighted loss and attention encoder-decoder recurrent neural network
title_full_unstemmed	Causal speech enhancement using dynamical-weighted loss and attention encoder-decoder recurrent neural network
title_short	Causal speech enhancement using dynamical-weighted loss and attention encoder-decoder recurrent neural network
title_sort	causal speech enhancement using dynamical-weighted loss and attention encoder-decoder recurrent neural network
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10174555/ https://www.ncbi.nlm.nih.gov/pubmed/37167227 http://dx.doi.org/10.1371/journal.pone.0285629
work_keys_str_mv	AT perachafahadkhalil causalspeechenhancementusingdynamicalweightedlossandattentionencoderdecoderrecurrentneuralnetwork AT khattakmuhammadirfan causalspeechenhancementusingdynamicalweightedlossandattentionencoderdecoderrecurrentneuralnetwork AT salemnema causalspeechenhancementusingdynamicalweightedlossandattentionencoderdecoderrecurrentneuralnetwork AT saleemnasir causalspeechenhancementusingdynamicalweightedlossandattentionencoderdecoderrecurrentneuralnetwork

Causal speech enhancement using dynamical-weighted loss and attention encoder-decoder recurrent neural network

Ejemplares similares