Cargando…

Sound Event Detection by Pseudo-Labeling in Weakly Labeled Dataset

Weakly labeled sound event detection (WSED) is an important task as it can facilitate the data collection efforts before constructing a strongly labeled sound event dataset. Recent high performance in deep learning-based WSED’s exploited using a segmentation mask for detecting the target feature map...

Descripción completa

Detalles Bibliográficos
Autores principales:	Park, Chungho, Kim, Donghyeon, Ko, Hanseok
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8705589/ https://www.ncbi.nlm.nih.gov/pubmed/34960475 http://dx.doi.org/10.3390/s21248375

_version_	1784621983867076608
author	Park, Chungho Kim, Donghyeon Ko, Hanseok
author_facet	Park, Chungho Kim, Donghyeon Ko, Hanseok
author_sort	Park, Chungho
collection	PubMed
description	Weakly labeled sound event detection (WSED) is an important task as it can facilitate the data collection efforts before constructing a strongly labeled sound event dataset. Recent high performance in deep learning-based WSED’s exploited using a segmentation mask for detecting the target feature map. However, achieving accurate detection performance was limited in real streaming audio due to the following reasons. First, the convolutional neural networks (CNN) employed in the segmentation mask extraction process do not appropriately highlight the importance of feature as the feature is extracted without pooling operations, and, concurrently, a small size kernel forces the receptive field small, making it difficult to learn various patterns. Second, as feature maps are obtained in an end-to-end fashion, the WSED model would be weak to unknown contents in the wild. These limitations would lead to generating undesired feature maps, such as noise in the unseen environment. This paper addresses these issues by constructing a more efficient model by employing a gated linear unit (GLU) and dilated convolution to improve the problems of de-emphasizing importance and lack of receptive field. In addition, this paper proposes pseudo-label-based learning for classifying target contents and unknown contents by adding ’noise label’ and ’noise loss’ so that unknown contents can be separated as much as possible through the noise label. The experiment is performed by mixing DCASE 2018 task1 acoustic scene data and task2 sound event data. The experimental results show that the proposed SED model achieves the best F1 performance with 59.7% at 0 SNR, 64.5% at 10 SNR, and 65.9% at 20 SNR. These results represent an improvement of 17.7%, 16.9%, and 16.5%, respectively, over the baseline.
format	Online Article Text
id	pubmed-8705589
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-87055892021-12-25 Sound Event Detection by Pseudo-Labeling in Weakly Labeled Dataset Park, Chungho Kim, Donghyeon Ko, Hanseok Sensors (Basel) Article Weakly labeled sound event detection (WSED) is an important task as it can facilitate the data collection efforts before constructing a strongly labeled sound event dataset. Recent high performance in deep learning-based WSED’s exploited using a segmentation mask for detecting the target feature map. However, achieving accurate detection performance was limited in real streaming audio due to the following reasons. First, the convolutional neural networks (CNN) employed in the segmentation mask extraction process do not appropriately highlight the importance of feature as the feature is extracted without pooling operations, and, concurrently, a small size kernel forces the receptive field small, making it difficult to learn various patterns. Second, as feature maps are obtained in an end-to-end fashion, the WSED model would be weak to unknown contents in the wild. These limitations would lead to generating undesired feature maps, such as noise in the unseen environment. This paper addresses these issues by constructing a more efficient model by employing a gated linear unit (GLU) and dilated convolution to improve the problems of de-emphasizing importance and lack of receptive field. In addition, this paper proposes pseudo-label-based learning for classifying target contents and unknown contents by adding ’noise label’ and ’noise loss’ so that unknown contents can be separated as much as possible through the noise label. The experiment is performed by mixing DCASE 2018 task1 acoustic scene data and task2 sound event data. The experimental results show that the proposed SED model achieves the best F1 performance with 59.7% at 0 SNR, 64.5% at 10 SNR, and 65.9% at 20 SNR. These results represent an improvement of 17.7%, 16.9%, and 16.5%, respectively, over the baseline. MDPI 2021-12-15 /pmc/articles/PMC8705589/ /pubmed/34960475 http://dx.doi.org/10.3390/s21248375 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Park, Chungho Kim, Donghyeon Ko, Hanseok Sound Event Detection by Pseudo-Labeling in Weakly Labeled Dataset
title	Sound Event Detection by Pseudo-Labeling in Weakly Labeled Dataset
title_full	Sound Event Detection by Pseudo-Labeling in Weakly Labeled Dataset
title_fullStr	Sound Event Detection by Pseudo-Labeling in Weakly Labeled Dataset
title_full_unstemmed	Sound Event Detection by Pseudo-Labeling in Weakly Labeled Dataset
title_short	Sound Event Detection by Pseudo-Labeling in Weakly Labeled Dataset
title_sort	sound event detection by pseudo-labeling in weakly labeled dataset
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8705589/ https://www.ncbi.nlm.nih.gov/pubmed/34960475 http://dx.doi.org/10.3390/s21248375
work_keys_str_mv	AT parkchungho soundeventdetectionbypseudolabelinginweaklylabeleddataset AT kimdonghyeon soundeventdetectionbypseudolabelinginweaklylabeleddataset AT kohanseok soundeventdetectionbypseudolabelinginweaklylabeleddataset

Sound Event Detection by Pseudo-Labeling in Weakly Labeled Dataset

Ejemplares similares