Cargando…
CST: Complex Sparse Transformer for Low-SNR Speech Enhancement
Speech enhancement tasks for audio with a low SNR are challenging. Existing speech enhancement methods are mainly designed for high SNR audio, and they usually use RNNs to model audio sequence features, which causes the model to be unable to learn long-distance dependencies, thus limiting its perfor...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10007472/ https://www.ncbi.nlm.nih.gov/pubmed/36904579 http://dx.doi.org/10.3390/s23052376 |
_version_ | 1784905529980616704 |
---|---|
author | Tan, Kaijun Mao, Wenyu Guo, Xiaozhou Lu, Huaxiang Zhang, Chi Cao, Zhanzhong Wang, Xingang |
author_facet | Tan, Kaijun Mao, Wenyu Guo, Xiaozhou Lu, Huaxiang Zhang, Chi Cao, Zhanzhong Wang, Xingang |
author_sort | Tan, Kaijun |
collection | PubMed |
description | Speech enhancement tasks for audio with a low SNR are challenging. Existing speech enhancement methods are mainly designed for high SNR audio, and they usually use RNNs to model audio sequence features, which causes the model to be unable to learn long-distance dependencies, thus limiting its performance in low-SNR speech enhancement tasks. We design a complex transformer module with sparse attention to overcome this problem. Different from the traditional transformer model, this model is extended to effectively model complex domain sequences, using the sparse attention mask balance model’s attention to long-distance and nearby relations, introducing the pre-layer positional embedding module to enhance the model’s perception of position information, adding the channel attention module to enable the model to dynamically adjust the weight distribution between channels according to the input audio. The experimental results show that, in the low-SNR speech enhancement tests, our models have noticeable performance improvements in speech quality and intelligibility, respectively. |
format | Online Article Text |
id | pubmed-10007472 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-100074722023-03-12 CST: Complex Sparse Transformer for Low-SNR Speech Enhancement Tan, Kaijun Mao, Wenyu Guo, Xiaozhou Lu, Huaxiang Zhang, Chi Cao, Zhanzhong Wang, Xingang Sensors (Basel) Article Speech enhancement tasks for audio with a low SNR are challenging. Existing speech enhancement methods are mainly designed for high SNR audio, and they usually use RNNs to model audio sequence features, which causes the model to be unable to learn long-distance dependencies, thus limiting its performance in low-SNR speech enhancement tasks. We design a complex transformer module with sparse attention to overcome this problem. Different from the traditional transformer model, this model is extended to effectively model complex domain sequences, using the sparse attention mask balance model’s attention to long-distance and nearby relations, introducing the pre-layer positional embedding module to enhance the model’s perception of position information, adding the channel attention module to enable the model to dynamically adjust the weight distribution between channels according to the input audio. The experimental results show that, in the low-SNR speech enhancement tests, our models have noticeable performance improvements in speech quality and intelligibility, respectively. MDPI 2023-02-21 /pmc/articles/PMC10007472/ /pubmed/36904579 http://dx.doi.org/10.3390/s23052376 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Tan, Kaijun Mao, Wenyu Guo, Xiaozhou Lu, Huaxiang Zhang, Chi Cao, Zhanzhong Wang, Xingang CST: Complex Sparse Transformer for Low-SNR Speech Enhancement |
title | CST: Complex Sparse Transformer for Low-SNR Speech Enhancement |
title_full | CST: Complex Sparse Transformer for Low-SNR Speech Enhancement |
title_fullStr | CST: Complex Sparse Transformer for Low-SNR Speech Enhancement |
title_full_unstemmed | CST: Complex Sparse Transformer for Low-SNR Speech Enhancement |
title_short | CST: Complex Sparse Transformer for Low-SNR Speech Enhancement |
title_sort | cst: complex sparse transformer for low-snr speech enhancement |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10007472/ https://www.ncbi.nlm.nih.gov/pubmed/36904579 http://dx.doi.org/10.3390/s23052376 |
work_keys_str_mv | AT tankaijun cstcomplexsparsetransformerforlowsnrspeechenhancement AT maowenyu cstcomplexsparsetransformerforlowsnrspeechenhancement AT guoxiaozhou cstcomplexsparsetransformerforlowsnrspeechenhancement AT luhuaxiang cstcomplexsparsetransformerforlowsnrspeechenhancement AT zhangchi cstcomplexsparsetransformerforlowsnrspeechenhancement AT caozhanzhong cstcomplexsparsetransformerforlowsnrspeechenhancement AT wangxingang cstcomplexsparsetransformerforlowsnrspeechenhancement |