Cargando…
Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification
Sound event detection (SED) recognizes the corresponding sound event of an incoming signal and estimates its temporal boundary. Although SED has been recently developed and used in various fields, achieving noise-robust SED in a real environment is typically challenging owing to the performance degr...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8540800/ https://www.ncbi.nlm.nih.gov/pubmed/34695930 http://dx.doi.org/10.3390/s21206718 |
_version_ | 1784589074416271360 |
---|---|
author | Son, Jin-Young Chang, Joon-Hyuk |
author_facet | Son, Jin-Young Chang, Joon-Hyuk |
author_sort | Son, Jin-Young |
collection | PubMed |
description | Sound event detection (SED) recognizes the corresponding sound event of an incoming signal and estimates its temporal boundary. Although SED has been recently developed and used in various fields, achieving noise-robust SED in a real environment is typically challenging owing to the performance degradation due to ambient noise. In this paper, we propose combining a pretrained time-domain speech-separation-based noise suppression network (NS) and a pretrained classification network to improve the SED performance in real noisy environments. We use group communication with a context codec method (GC3)-equipped temporal convolutional network (TCN) for the noise suppression model and a convolutional recurrent neural network for the SED model. The former significantly reduce the model complexity while maintaining the same TCN module and performance as a fully convolutional time-domain audio separation network (Conv-TasNet). We also do not update the weights of some layers (i.e., freeze) in the joint fine-tuning process and add an attention module in the SED model to further improve the performance and prevent overfitting. We evaluate our proposed method using both simulation and real recorded datasets. The experimental results show that our method improves the classification performance in a noisy environment under various signal-to-noise-ratio conditions. |
format | Online Article Text |
id | pubmed-8540800 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-85408002021-10-24 Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification Son, Jin-Young Chang, Joon-Hyuk Sensors (Basel) Article Sound event detection (SED) recognizes the corresponding sound event of an incoming signal and estimates its temporal boundary. Although SED has been recently developed and used in various fields, achieving noise-robust SED in a real environment is typically challenging owing to the performance degradation due to ambient noise. In this paper, we propose combining a pretrained time-domain speech-separation-based noise suppression network (NS) and a pretrained classification network to improve the SED performance in real noisy environments. We use group communication with a context codec method (GC3)-equipped temporal convolutional network (TCN) for the noise suppression model and a convolutional recurrent neural network for the SED model. The former significantly reduce the model complexity while maintaining the same TCN module and performance as a fully convolutional time-domain audio separation network (Conv-TasNet). We also do not update the weights of some layers (i.e., freeze) in the joint fine-tuning process and add an attention module in the SED model to further improve the performance and prevent overfitting. We evaluate our proposed method using both simulation and real recorded datasets. The experimental results show that our method improves the classification performance in a noisy environment under various signal-to-noise-ratio conditions. MDPI 2021-10-09 /pmc/articles/PMC8540800/ /pubmed/34695930 http://dx.doi.org/10.3390/s21206718 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Son, Jin-Young Chang, Joon-Hyuk Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification |
title | Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification |
title_full | Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification |
title_fullStr | Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification |
title_full_unstemmed | Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification |
title_short | Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification |
title_sort | attention-based joint training of noise suppression and sound event detection for noise-robust classification |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8540800/ https://www.ncbi.nlm.nih.gov/pubmed/34695930 http://dx.doi.org/10.3390/s21206718 |
work_keys_str_mv | AT sonjinyoung attentionbasedjointtrainingofnoisesuppressionandsoundeventdetectionfornoiserobustclassification AT changjoonhyuk attentionbasedjointtrainingofnoisesuppressionandsoundeventdetectionfornoiserobustclassification |