Cargando…

Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification

Sound event detection (SED) recognizes the corresponding sound event of an incoming signal and estimates its temporal boundary. Although SED has been recently developed and used in various fields, achieving noise-robust SED in a real environment is typically challenging owing to the performance degr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Son, Jin-Young, Chang, Joon-Hyuk
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8540800/ https://www.ncbi.nlm.nih.gov/pubmed/34695930 http://dx.doi.org/10.3390/s21206718

_version_	1784589074416271360
author	Son, Jin-Young Chang, Joon-Hyuk
author_facet	Son, Jin-Young Chang, Joon-Hyuk
author_sort	Son, Jin-Young
collection	PubMed
description	Sound event detection (SED) recognizes the corresponding sound event of an incoming signal and estimates its temporal boundary. Although SED has been recently developed and used in various fields, achieving noise-robust SED in a real environment is typically challenging owing to the performance degradation due to ambient noise. In this paper, we propose combining a pretrained time-domain speech-separation-based noise suppression network (NS) and a pretrained classification network to improve the SED performance in real noisy environments. We use group communication with a context codec method (GC3)-equipped temporal convolutional network (TCN) for the noise suppression model and a convolutional recurrent neural network for the SED model. The former significantly reduce the model complexity while maintaining the same TCN module and performance as a fully convolutional time-domain audio separation network (Conv-TasNet). We also do not update the weights of some layers (i.e., freeze) in the joint fine-tuning process and add an attention module in the SED model to further improve the performance and prevent overfitting. We evaluate our proposed method using both simulation and real recorded datasets. The experimental results show that our method improves the classification performance in a noisy environment under various signal-to-noise-ratio conditions.
format	Online Article Text
id	pubmed-8540800
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-85408002021-10-24 Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification Son, Jin-Young Chang, Joon-Hyuk Sensors (Basel) Article Sound event detection (SED) recognizes the corresponding sound event of an incoming signal and estimates its temporal boundary. Although SED has been recently developed and used in various fields, achieving noise-robust SED in a real environment is typically challenging owing to the performance degradation due to ambient noise. In this paper, we propose combining a pretrained time-domain speech-separation-based noise suppression network (NS) and a pretrained classification network to improve the SED performance in real noisy environments. We use group communication with a context codec method (GC3)-equipped temporal convolutional network (TCN) for the noise suppression model and a convolutional recurrent neural network for the SED model. The former significantly reduce the model complexity while maintaining the same TCN module and performance as a fully convolutional time-domain audio separation network (Conv-TasNet). We also do not update the weights of some layers (i.e., freeze) in the joint fine-tuning process and add an attention module in the SED model to further improve the performance and prevent overfitting. We evaluate our proposed method using both simulation and real recorded datasets. The experimental results show that our method improves the classification performance in a noisy environment under various signal-to-noise-ratio conditions. MDPI 2021-10-09 /pmc/articles/PMC8540800/ /pubmed/34695930 http://dx.doi.org/10.3390/s21206718 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Son, Jin-Young Chang, Joon-Hyuk Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification
title	Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification
title_full	Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification
title_fullStr	Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification
title_full_unstemmed	Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification
title_short	Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification
title_sort	attention-based joint training of noise suppression and sound event detection for noise-robust classification
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8540800/ https://www.ncbi.nlm.nih.gov/pubmed/34695930 http://dx.doi.org/10.3390/s21206718
work_keys_str_mv	AT sonjinyoung attentionbasedjointtrainingofnoisesuppressionandsoundeventdetectionfornoiserobustclassification AT changjoonhyuk attentionbasedjointtrainingofnoisesuppressionandsoundeventdetectionfornoiserobustclassification

Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification

Ejemplares similares