Cargando…

Investigations on the Optimal Estimation of Speech Envelopes for the Two-Stage Speech Enhancement

Using the source-filter model of speech production, clean speech signals can be decomposed into an excitation component and an envelope component that is related to the phoneme being uttered. Therefore, restoring the envelope of degraded speech during speech enhancement can improve the intelligibili...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Yanjue, Madhu, Nilesh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10384514/
https://www.ncbi.nlm.nih.gov/pubmed/37514732
http://dx.doi.org/10.3390/s23146438
_version_ 1785081176282628096
author Song, Yanjue
Madhu, Nilesh
author_facet Song, Yanjue
Madhu, Nilesh
author_sort Song, Yanjue
collection PubMed
description Using the source-filter model of speech production, clean speech signals can be decomposed into an excitation component and an envelope component that is related to the phoneme being uttered. Therefore, restoring the envelope of degraded speech during speech enhancement can improve the intelligibility and quality of output. As the number of phonemes in spoken speech is limited, they can be adequately represented by a correspondingly limited number of envelopes. This can be exploited to improve the estimation of speech envelopes from a degraded signal in a data-driven manner. The improved envelopes are then used in a second stage to refine the final speech estimate. Envelopes are typically derived from the linear prediction coefficients (LPCs) or from the cepstral coefficients (CCs). The improved envelope is obtained either by mapping the degraded envelope onto pre-trained codebooks (classification approach) or by directly estimating it from the degraded envelope (regression approach). In this work, we first investigate the optimal features for envelope representation and codebook generation by a series of oracle tests. We demonstrate that CCs provide better envelope representation compared to using the LPCs. Further, we demonstrate that a unified speech codebook is advantageous compared to the typical codebook that manually splits speech and silence as separate entries. Next, we investigate low-complexity neural network architectures to map degraded envelopes to the optimal codebook entry in practical systems. We confirm that simple recurrent neural networks yield good performance with a low complexity and number of parameters. We also demonstrate that with a careful choice of the feature and architecture, a regression approach can further improve the performance at a lower computational cost. However, as also seen from the oracle tests, the benefit of the two-stage framework is now chiefly limited by the statistical noise floor estimate, leading to only a limited improvement in extremely adverse conditions. This highlights the need for further research on joint estimation of speech and noise for optimum enhancement.
format Online
Article
Text
id pubmed-10384514
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-103845142023-07-30 Investigations on the Optimal Estimation of Speech Envelopes for the Two-Stage Speech Enhancement Song, Yanjue Madhu, Nilesh Sensors (Basel) Article Using the source-filter model of speech production, clean speech signals can be decomposed into an excitation component and an envelope component that is related to the phoneme being uttered. Therefore, restoring the envelope of degraded speech during speech enhancement can improve the intelligibility and quality of output. As the number of phonemes in spoken speech is limited, they can be adequately represented by a correspondingly limited number of envelopes. This can be exploited to improve the estimation of speech envelopes from a degraded signal in a data-driven manner. The improved envelopes are then used in a second stage to refine the final speech estimate. Envelopes are typically derived from the linear prediction coefficients (LPCs) or from the cepstral coefficients (CCs). The improved envelope is obtained either by mapping the degraded envelope onto pre-trained codebooks (classification approach) or by directly estimating it from the degraded envelope (regression approach). In this work, we first investigate the optimal features for envelope representation and codebook generation by a series of oracle tests. We demonstrate that CCs provide better envelope representation compared to using the LPCs. Further, we demonstrate that a unified speech codebook is advantageous compared to the typical codebook that manually splits speech and silence as separate entries. Next, we investigate low-complexity neural network architectures to map degraded envelopes to the optimal codebook entry in practical systems. We confirm that simple recurrent neural networks yield good performance with a low complexity and number of parameters. We also demonstrate that with a careful choice of the feature and architecture, a regression approach can further improve the performance at a lower computational cost. However, as also seen from the oracle tests, the benefit of the two-stage framework is now chiefly limited by the statistical noise floor estimate, leading to only a limited improvement in extremely adverse conditions. This highlights the need for further research on joint estimation of speech and noise for optimum enhancement. MDPI 2023-07-16 /pmc/articles/PMC10384514/ /pubmed/37514732 http://dx.doi.org/10.3390/s23146438 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Song, Yanjue
Madhu, Nilesh
Investigations on the Optimal Estimation of Speech Envelopes for the Two-Stage Speech Enhancement
title Investigations on the Optimal Estimation of Speech Envelopes for the Two-Stage Speech Enhancement
title_full Investigations on the Optimal Estimation of Speech Envelopes for the Two-Stage Speech Enhancement
title_fullStr Investigations on the Optimal Estimation of Speech Envelopes for the Two-Stage Speech Enhancement
title_full_unstemmed Investigations on the Optimal Estimation of Speech Envelopes for the Two-Stage Speech Enhancement
title_short Investigations on the Optimal Estimation of Speech Envelopes for the Two-Stage Speech Enhancement
title_sort investigations on the optimal estimation of speech envelopes for the two-stage speech enhancement
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10384514/
https://www.ncbi.nlm.nih.gov/pubmed/37514732
http://dx.doi.org/10.3390/s23146438
work_keys_str_mv AT songyanjue investigationsontheoptimalestimationofspeechenvelopesforthetwostagespeechenhancement
AT madhunilesh investigationsontheoptimalestimationofspeechenvelopesforthetwostagespeechenhancement