Cargando…

Speech Perception Improvement Algorithm Based on a Dual-Path Long Short-Term Memory Network

Current deep learning-based speech enhancement methods focus on enhancing the time–frequency representation of the signal. However, conventional methods can lead to speech damage due to resolution mismatch problems that emphasize only specific information in the time or frequency domain. To address...

Descripción completa

Detalles Bibliográficos
Autores principales: Koh, Hyeong Il, Na, Sungdae, Kim, Myoung Nam
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10669314/
https://www.ncbi.nlm.nih.gov/pubmed/38002449
http://dx.doi.org/10.3390/bioengineering10111325
_version_ 1785149222418382848
author Koh, Hyeong Il
Na, Sungdae
Kim, Myoung Nam
author_facet Koh, Hyeong Il
Na, Sungdae
Kim, Myoung Nam
author_sort Koh, Hyeong Il
collection PubMed
description Current deep learning-based speech enhancement methods focus on enhancing the time–frequency representation of the signal. However, conventional methods can lead to speech damage due to resolution mismatch problems that emphasize only specific information in the time or frequency domain. To address these challenges, this paper introduces a speech enhancement model designed with a dual-path structure that identifies key speech characteristics in both the time and time–frequency domains. Specifically, the time path aims to model semantic features hidden in the waveform, while the time–frequency path attempts to compensate for the spectral details via a spectral extension block. These two paths enhance temporal and spectral features via mask functions modeled as LSTM, respectively, offering a comprehensive approach to speech enhancement. Experimental results show that the proposed dual-path LSTM network consistently outperforms conventional single-domain speech enhancement methods in terms of speech quality and intelligibility.
format Online
Article
Text
id pubmed-10669314
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-106693142023-11-16 Speech Perception Improvement Algorithm Based on a Dual-Path Long Short-Term Memory Network Koh, Hyeong Il Na, Sungdae Kim, Myoung Nam Bioengineering (Basel) Article Current deep learning-based speech enhancement methods focus on enhancing the time–frequency representation of the signal. However, conventional methods can lead to speech damage due to resolution mismatch problems that emphasize only specific information in the time or frequency domain. To address these challenges, this paper introduces a speech enhancement model designed with a dual-path structure that identifies key speech characteristics in both the time and time–frequency domains. Specifically, the time path aims to model semantic features hidden in the waveform, while the time–frequency path attempts to compensate for the spectral details via a spectral extension block. These two paths enhance temporal and spectral features via mask functions modeled as LSTM, respectively, offering a comprehensive approach to speech enhancement. Experimental results show that the proposed dual-path LSTM network consistently outperforms conventional single-domain speech enhancement methods in terms of speech quality and intelligibility. MDPI 2023-11-16 /pmc/articles/PMC10669314/ /pubmed/38002449 http://dx.doi.org/10.3390/bioengineering10111325 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Koh, Hyeong Il
Na, Sungdae
Kim, Myoung Nam
Speech Perception Improvement Algorithm Based on a Dual-Path Long Short-Term Memory Network
title Speech Perception Improvement Algorithm Based on a Dual-Path Long Short-Term Memory Network
title_full Speech Perception Improvement Algorithm Based on a Dual-Path Long Short-Term Memory Network
title_fullStr Speech Perception Improvement Algorithm Based on a Dual-Path Long Short-Term Memory Network
title_full_unstemmed Speech Perception Improvement Algorithm Based on a Dual-Path Long Short-Term Memory Network
title_short Speech Perception Improvement Algorithm Based on a Dual-Path Long Short-Term Memory Network
title_sort speech perception improvement algorithm based on a dual-path long short-term memory network
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10669314/
https://www.ncbi.nlm.nih.gov/pubmed/38002449
http://dx.doi.org/10.3390/bioengineering10111325
work_keys_str_mv AT kohhyeongil speechperceptionimprovementalgorithmbasedonadualpathlongshorttermmemorynetwork
AT nasungdae speechperceptionimprovementalgorithmbasedonadualpathlongshorttermmemorynetwork
AT kimmyoungnam speechperceptionimprovementalgorithmbasedonadualpathlongshorttermmemorynetwork