Cargando…
Speech Perception Improvement Algorithm Based on a Dual-Path Long Short-Term Memory Network
Current deep learning-based speech enhancement methods focus on enhancing the time–frequency representation of the signal. However, conventional methods can lead to speech damage due to resolution mismatch problems that emphasize only specific information in the time or frequency domain. To address...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10669314/ https://www.ncbi.nlm.nih.gov/pubmed/38002449 http://dx.doi.org/10.3390/bioengineering10111325 |
_version_ | 1785149222418382848 |
---|---|
author | Koh, Hyeong Il Na, Sungdae Kim, Myoung Nam |
author_facet | Koh, Hyeong Il Na, Sungdae Kim, Myoung Nam |
author_sort | Koh, Hyeong Il |
collection | PubMed |
description | Current deep learning-based speech enhancement methods focus on enhancing the time–frequency representation of the signal. However, conventional methods can lead to speech damage due to resolution mismatch problems that emphasize only specific information in the time or frequency domain. To address these challenges, this paper introduces a speech enhancement model designed with a dual-path structure that identifies key speech characteristics in both the time and time–frequency domains. Specifically, the time path aims to model semantic features hidden in the waveform, while the time–frequency path attempts to compensate for the spectral details via a spectral extension block. These two paths enhance temporal and spectral features via mask functions modeled as LSTM, respectively, offering a comprehensive approach to speech enhancement. Experimental results show that the proposed dual-path LSTM network consistently outperforms conventional single-domain speech enhancement methods in terms of speech quality and intelligibility. |
format | Online Article Text |
id | pubmed-10669314 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-106693142023-11-16 Speech Perception Improvement Algorithm Based on a Dual-Path Long Short-Term Memory Network Koh, Hyeong Il Na, Sungdae Kim, Myoung Nam Bioengineering (Basel) Article Current deep learning-based speech enhancement methods focus on enhancing the time–frequency representation of the signal. However, conventional methods can lead to speech damage due to resolution mismatch problems that emphasize only specific information in the time or frequency domain. To address these challenges, this paper introduces a speech enhancement model designed with a dual-path structure that identifies key speech characteristics in both the time and time–frequency domains. Specifically, the time path aims to model semantic features hidden in the waveform, while the time–frequency path attempts to compensate for the spectral details via a spectral extension block. These two paths enhance temporal and spectral features via mask functions modeled as LSTM, respectively, offering a comprehensive approach to speech enhancement. Experimental results show that the proposed dual-path LSTM network consistently outperforms conventional single-domain speech enhancement methods in terms of speech quality and intelligibility. MDPI 2023-11-16 /pmc/articles/PMC10669314/ /pubmed/38002449 http://dx.doi.org/10.3390/bioengineering10111325 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Koh, Hyeong Il Na, Sungdae Kim, Myoung Nam Speech Perception Improvement Algorithm Based on a Dual-Path Long Short-Term Memory Network |
title | Speech Perception Improvement Algorithm Based on a Dual-Path Long Short-Term Memory Network |
title_full | Speech Perception Improvement Algorithm Based on a Dual-Path Long Short-Term Memory Network |
title_fullStr | Speech Perception Improvement Algorithm Based on a Dual-Path Long Short-Term Memory Network |
title_full_unstemmed | Speech Perception Improvement Algorithm Based on a Dual-Path Long Short-Term Memory Network |
title_short | Speech Perception Improvement Algorithm Based on a Dual-Path Long Short-Term Memory Network |
title_sort | speech perception improvement algorithm based on a dual-path long short-term memory network |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10669314/ https://www.ncbi.nlm.nih.gov/pubmed/38002449 http://dx.doi.org/10.3390/bioengineering10111325 |
work_keys_str_mv | AT kohhyeongil speechperceptionimprovementalgorithmbasedonadualpathlongshorttermmemorynetwork AT nasungdae speechperceptionimprovementalgorithmbasedonadualpathlongshorttermmemorynetwork AT kimmyoungnam speechperceptionimprovementalgorithmbasedonadualpathlongshorttermmemorynetwork |