Cargando…

CST: Complex Sparse Transformer for Low-SNR Speech Enhancement

Speech enhancement tasks for audio with a low SNR are challenging. Existing speech enhancement methods are mainly designed for high SNR audio, and they usually use RNNs to model audio sequence features, which causes the model to be unable to learn long-distance dependencies, thus limiting its perfor...

Descripción completa

Detalles Bibliográficos
Autores principales: Tan, Kaijun, Mao, Wenyu, Guo, Xiaozhou, Lu, Huaxiang, Zhang, Chi, Cao, Zhanzhong, Wang, Xingang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10007472/
https://www.ncbi.nlm.nih.gov/pubmed/36904579
http://dx.doi.org/10.3390/s23052376
_version_ 1784905529980616704
author Tan, Kaijun
Mao, Wenyu
Guo, Xiaozhou
Lu, Huaxiang
Zhang, Chi
Cao, Zhanzhong
Wang, Xingang
author_facet Tan, Kaijun
Mao, Wenyu
Guo, Xiaozhou
Lu, Huaxiang
Zhang, Chi
Cao, Zhanzhong
Wang, Xingang
author_sort Tan, Kaijun
collection PubMed
description Speech enhancement tasks for audio with a low SNR are challenging. Existing speech enhancement methods are mainly designed for high SNR audio, and they usually use RNNs to model audio sequence features, which causes the model to be unable to learn long-distance dependencies, thus limiting its performance in low-SNR speech enhancement tasks. We design a complex transformer module with sparse attention to overcome this problem. Different from the traditional transformer model, this model is extended to effectively model complex domain sequences, using the sparse attention mask balance model’s attention to long-distance and nearby relations, introducing the pre-layer positional embedding module to enhance the model’s perception of position information, adding the channel attention module to enable the model to dynamically adjust the weight distribution between channels according to the input audio. The experimental results show that, in the low-SNR speech enhancement tests, our models have noticeable performance improvements in speech quality and intelligibility, respectively.
format Online
Article
Text
id pubmed-10007472
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-100074722023-03-12 CST: Complex Sparse Transformer for Low-SNR Speech Enhancement Tan, Kaijun Mao, Wenyu Guo, Xiaozhou Lu, Huaxiang Zhang, Chi Cao, Zhanzhong Wang, Xingang Sensors (Basel) Article Speech enhancement tasks for audio with a low SNR are challenging. Existing speech enhancement methods are mainly designed for high SNR audio, and they usually use RNNs to model audio sequence features, which causes the model to be unable to learn long-distance dependencies, thus limiting its performance in low-SNR speech enhancement tasks. We design a complex transformer module with sparse attention to overcome this problem. Different from the traditional transformer model, this model is extended to effectively model complex domain sequences, using the sparse attention mask balance model’s attention to long-distance and nearby relations, introducing the pre-layer positional embedding module to enhance the model’s perception of position information, adding the channel attention module to enable the model to dynamically adjust the weight distribution between channels according to the input audio. The experimental results show that, in the low-SNR speech enhancement tests, our models have noticeable performance improvements in speech quality and intelligibility, respectively. MDPI 2023-02-21 /pmc/articles/PMC10007472/ /pubmed/36904579 http://dx.doi.org/10.3390/s23052376 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Tan, Kaijun
Mao, Wenyu
Guo, Xiaozhou
Lu, Huaxiang
Zhang, Chi
Cao, Zhanzhong
Wang, Xingang
CST: Complex Sparse Transformer for Low-SNR Speech Enhancement
title CST: Complex Sparse Transformer for Low-SNR Speech Enhancement
title_full CST: Complex Sparse Transformer for Low-SNR Speech Enhancement
title_fullStr CST: Complex Sparse Transformer for Low-SNR Speech Enhancement
title_full_unstemmed CST: Complex Sparse Transformer for Low-SNR Speech Enhancement
title_short CST: Complex Sparse Transformer for Low-SNR Speech Enhancement
title_sort cst: complex sparse transformer for low-snr speech enhancement
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10007472/
https://www.ncbi.nlm.nih.gov/pubmed/36904579
http://dx.doi.org/10.3390/s23052376
work_keys_str_mv AT tankaijun cstcomplexsparsetransformerforlowsnrspeechenhancement
AT maowenyu cstcomplexsparsetransformerforlowsnrspeechenhancement
AT guoxiaozhou cstcomplexsparsetransformerforlowsnrspeechenhancement
AT luhuaxiang cstcomplexsparsetransformerforlowsnrspeechenhancement
AT zhangchi cstcomplexsparsetransformerforlowsnrspeechenhancement
AT caozhanzhong cstcomplexsparsetransformerforlowsnrspeechenhancement
AT wangxingang cstcomplexsparsetransformerforlowsnrspeechenhancement