Cargando…

Improved Transformer-Based Dual-Path Network with Amplitude and Complex Domain Feature Fusion for Speech Enhancement

Most previous speech enhancement methods only predict amplitude features, but more and more studies have proved that phase information is crucial for speech quality. Recently, there have also been some methods to choose complex features, but complex masks are difficult to estimate. Removing noise wh...

Descripción completa

Detalles Bibliográficos
Autores principales: Ye, Moujia, Wan, Hongjie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9955017/
https://www.ncbi.nlm.nih.gov/pubmed/36832595
http://dx.doi.org/10.3390/e25020228
_version_ 1784894253061636096
author Ye, Moujia
Wan, Hongjie
author_facet Ye, Moujia
Wan, Hongjie
author_sort Ye, Moujia
collection PubMed
description Most previous speech enhancement methods only predict amplitude features, but more and more studies have proved that phase information is crucial for speech quality. Recently, there have also been some methods to choose complex features, but complex masks are difficult to estimate. Removing noise while maintaining good speech quality at low signal-to-noise ratios is still a problem. This study proposes a dual-path network structure for speech enhancement that can model complex spectra and amplitudes simultaneously, and introduces an attention-aware feature fusion module to fuse the two features to facilitate overall spectrum recovery. In addition, we improve a transformer-based feature extraction module that can efficiently extract local and global features. The proposed network achieves better performance than the baseline models in experiments on the Voice Bank + DEMAND dataset. We also conducted ablation experiments to verify the effectiveness of the dual-path structure, the improved transformer, and the fusion module, and investigated the effect of the input-mask multiplication strategy on the results.
format Online
Article
Text
id pubmed-9955017
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-99550172023-02-25 Improved Transformer-Based Dual-Path Network with Amplitude and Complex Domain Feature Fusion for Speech Enhancement Ye, Moujia Wan, Hongjie Entropy (Basel) Article Most previous speech enhancement methods only predict amplitude features, but more and more studies have proved that phase information is crucial for speech quality. Recently, there have also been some methods to choose complex features, but complex masks are difficult to estimate. Removing noise while maintaining good speech quality at low signal-to-noise ratios is still a problem. This study proposes a dual-path network structure for speech enhancement that can model complex spectra and amplitudes simultaneously, and introduces an attention-aware feature fusion module to fuse the two features to facilitate overall spectrum recovery. In addition, we improve a transformer-based feature extraction module that can efficiently extract local and global features. The proposed network achieves better performance than the baseline models in experiments on the Voice Bank + DEMAND dataset. We also conducted ablation experiments to verify the effectiveness of the dual-path structure, the improved transformer, and the fusion module, and investigated the effect of the input-mask multiplication strategy on the results. MDPI 2023-01-26 /pmc/articles/PMC9955017/ /pubmed/36832595 http://dx.doi.org/10.3390/e25020228 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ye, Moujia
Wan, Hongjie
Improved Transformer-Based Dual-Path Network with Amplitude and Complex Domain Feature Fusion for Speech Enhancement
title Improved Transformer-Based Dual-Path Network with Amplitude and Complex Domain Feature Fusion for Speech Enhancement
title_full Improved Transformer-Based Dual-Path Network with Amplitude and Complex Domain Feature Fusion for Speech Enhancement
title_fullStr Improved Transformer-Based Dual-Path Network with Amplitude and Complex Domain Feature Fusion for Speech Enhancement
title_full_unstemmed Improved Transformer-Based Dual-Path Network with Amplitude and Complex Domain Feature Fusion for Speech Enhancement
title_short Improved Transformer-Based Dual-Path Network with Amplitude and Complex Domain Feature Fusion for Speech Enhancement
title_sort improved transformer-based dual-path network with amplitude and complex domain feature fusion for speech enhancement
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9955017/
https://www.ncbi.nlm.nih.gov/pubmed/36832595
http://dx.doi.org/10.3390/e25020228
work_keys_str_mv AT yemoujia improvedtransformerbaseddualpathnetworkwithamplitudeandcomplexdomainfeaturefusionforspeechenhancement
AT wanhongjie improvedtransformerbaseddualpathnetworkwithamplitudeandcomplexdomainfeaturefusionforspeechenhancement