Cargando…

Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification

Sound event detection (SED) recognizes the corresponding sound event of an incoming signal and estimates its temporal boundary. Although SED has been recently developed and used in various fields, achieving noise-robust SED in a real environment is typically challenging owing to the performance degr...

Descripción completa

Detalles Bibliográficos
Autores principales: Son, Jin-Young, Chang, Joon-Hyuk
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8540800/
https://www.ncbi.nlm.nih.gov/pubmed/34695930
http://dx.doi.org/10.3390/s21206718
_version_ 1784589074416271360
author Son, Jin-Young
Chang, Joon-Hyuk
author_facet Son, Jin-Young
Chang, Joon-Hyuk
author_sort Son, Jin-Young
collection PubMed
description Sound event detection (SED) recognizes the corresponding sound event of an incoming signal and estimates its temporal boundary. Although SED has been recently developed and used in various fields, achieving noise-robust SED in a real environment is typically challenging owing to the performance degradation due to ambient noise. In this paper, we propose combining a pretrained time-domain speech-separation-based noise suppression network (NS) and a pretrained classification network to improve the SED performance in real noisy environments. We use group communication with a context codec method (GC3)-equipped temporal convolutional network (TCN) for the noise suppression model and a convolutional recurrent neural network for the SED model. The former significantly reduce the model complexity while maintaining the same TCN module and performance as a fully convolutional time-domain audio separation network (Conv-TasNet). We also do not update the weights of some layers (i.e., freeze) in the joint fine-tuning process and add an attention module in the SED model to further improve the performance and prevent overfitting. We evaluate our proposed method using both simulation and real recorded datasets. The experimental results show that our method improves the classification performance in a noisy environment under various signal-to-noise-ratio conditions.
format Online
Article
Text
id pubmed-8540800
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-85408002021-10-24 Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification Son, Jin-Young Chang, Joon-Hyuk Sensors (Basel) Article Sound event detection (SED) recognizes the corresponding sound event of an incoming signal and estimates its temporal boundary. Although SED has been recently developed and used in various fields, achieving noise-robust SED in a real environment is typically challenging owing to the performance degradation due to ambient noise. In this paper, we propose combining a pretrained time-domain speech-separation-based noise suppression network (NS) and a pretrained classification network to improve the SED performance in real noisy environments. We use group communication with a context codec method (GC3)-equipped temporal convolutional network (TCN) for the noise suppression model and a convolutional recurrent neural network for the SED model. The former significantly reduce the model complexity while maintaining the same TCN module and performance as a fully convolutional time-domain audio separation network (Conv-TasNet). We also do not update the weights of some layers (i.e., freeze) in the joint fine-tuning process and add an attention module in the SED model to further improve the performance and prevent overfitting. We evaluate our proposed method using both simulation and real recorded datasets. The experimental results show that our method improves the classification performance in a noisy environment under various signal-to-noise-ratio conditions. MDPI 2021-10-09 /pmc/articles/PMC8540800/ /pubmed/34695930 http://dx.doi.org/10.3390/s21206718 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Son, Jin-Young
Chang, Joon-Hyuk
Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification
title Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification
title_full Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification
title_fullStr Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification
title_full_unstemmed Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification
title_short Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification
title_sort attention-based joint training of noise suppression and sound event detection for noise-robust classification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8540800/
https://www.ncbi.nlm.nih.gov/pubmed/34695930
http://dx.doi.org/10.3390/s21206718
work_keys_str_mv AT sonjinyoung attentionbasedjointtrainingofnoisesuppressionandsoundeventdetectionfornoiserobustclassification
AT changjoonhyuk attentionbasedjointtrainingofnoisesuppressionandsoundeventdetectionfornoiserobustclassification