Cargando…

Characterization of Deep Learning-Based Speech-Enhancement Techniques in Online Audio Processing Applications

Deep learning-based speech-enhancement techniques have recently been an area of growing interest, since their impressive performance can potentially benefit a wide variety of digital voice communication systems. However, such performance has been evaluated mostly in offline audio-processing scenario...

Descripción completa

Detalles Bibliográficos
Autor principal:	Rascon, Caleb
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10181690/ https://www.ncbi.nlm.nih.gov/pubmed/37177598 http://dx.doi.org/10.3390/s23094394

_version_	1785041635007004672
author	Rascon, Caleb
author_facet	Rascon, Caleb
author_sort	Rascon, Caleb
collection	PubMed
description	Deep learning-based speech-enhancement techniques have recently been an area of growing interest, since their impressive performance can potentially benefit a wide variety of digital voice communication systems. However, such performance has been evaluated mostly in offline audio-processing scenarios (i.e., feeding the model, in one go, a complete audio recording, which may extend several seconds). It is of significant interest to evaluate and characterize the current state-of-the-art in applications that process audio online (i.e., feeding the model a sequence of segments of audio data, concatenating the results at the output end). Although evaluations and comparisons between speech-enhancement techniques have been carried out before, as far as the author knows, the work presented here is the first that evaluates the performance of such techniques in relation to their online applicability. This means that this work measures how the output signal-to-interference ratio (as a separation metric), the response time, and memory usage (as online metrics) are impacted by the input length (the size of audio segments), in addition to the amount of noise, amount and number of interferences, and amount of reverberation. Three popular models were evaluated, given their availability on public repositories and online viability, MetricGAN+, Spectral Feature Mapping with Mimic Loss, and Demucs-Denoiser. The characterization was carried out using a systematic evaluation protocol based on the Speechbrain framework. Several intuitions are presented and discussed, and some recommendations for future work are proposed.
format	Online Article Text
id	pubmed-10181690
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-101816902023-05-13 Characterization of Deep Learning-Based Speech-Enhancement Techniques in Online Audio Processing Applications Rascon, Caleb Sensors (Basel) Article Deep learning-based speech-enhancement techniques have recently been an area of growing interest, since their impressive performance can potentially benefit a wide variety of digital voice communication systems. However, such performance has been evaluated mostly in offline audio-processing scenarios (i.e., feeding the model, in one go, a complete audio recording, which may extend several seconds). It is of significant interest to evaluate and characterize the current state-of-the-art in applications that process audio online (i.e., feeding the model a sequence of segments of audio data, concatenating the results at the output end). Although evaluations and comparisons between speech-enhancement techniques have been carried out before, as far as the author knows, the work presented here is the first that evaluates the performance of such techniques in relation to their online applicability. This means that this work measures how the output signal-to-interference ratio (as a separation metric), the response time, and memory usage (as online metrics) are impacted by the input length (the size of audio segments), in addition to the amount of noise, amount and number of interferences, and amount of reverberation. Three popular models were evaluated, given their availability on public repositories and online viability, MetricGAN+, Spectral Feature Mapping with Mimic Loss, and Demucs-Denoiser. The characterization was carried out using a systematic evaluation protocol based on the Speechbrain framework. Several intuitions are presented and discussed, and some recommendations for future work are proposed. MDPI 2023-04-29 /pmc/articles/PMC10181690/ /pubmed/37177598 http://dx.doi.org/10.3390/s23094394 Text en © 2023 by the author. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Rascon, Caleb Characterization of Deep Learning-Based Speech-Enhancement Techniques in Online Audio Processing Applications
title	Characterization of Deep Learning-Based Speech-Enhancement Techniques in Online Audio Processing Applications
title_full	Characterization of Deep Learning-Based Speech-Enhancement Techniques in Online Audio Processing Applications
title_fullStr	Characterization of Deep Learning-Based Speech-Enhancement Techniques in Online Audio Processing Applications
title_full_unstemmed	Characterization of Deep Learning-Based Speech-Enhancement Techniques in Online Audio Processing Applications
title_short	Characterization of Deep Learning-Based Speech-Enhancement Techniques in Online Audio Processing Applications
title_sort	characterization of deep learning-based speech-enhancement techniques in online audio processing applications
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10181690/ https://www.ncbi.nlm.nih.gov/pubmed/37177598 http://dx.doi.org/10.3390/s23094394
work_keys_str_mv	AT rasconcaleb characterizationofdeeplearningbasedspeechenhancementtechniquesinonlineaudioprocessingapplications

Characterization of Deep Learning-Based Speech-Enhancement Techniques in Online Audio Processing Applications

Ejemplares similares