Cargando…

Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention

The complexity of polyphonic sounds imposes numerous challenges on their classification. Especially in real life, polyphonic sound events have discontinuity and unstable time-frequency variations. Traditional single acoustic features cannot characterize the key feature information of the polyphonic...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jin, Ye, Wang, Mei, Luo, Liyan, Zhao, Dinghao, Liu, Zhanqi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9503981/ https://www.ncbi.nlm.nih.gov/pubmed/36146166 http://dx.doi.org/10.3390/s22186818

_version_	1784796101192187904
author	Jin, Ye Wang, Mei Luo, Liyan Zhao, Dinghao Liu, Zhanqi
author_facet	Jin, Ye Wang, Mei Luo, Liyan Zhao, Dinghao Liu, Zhanqi
author_sort	Jin, Ye
collection	PubMed
description	The complexity of polyphonic sounds imposes numerous challenges on their classification. Especially in real life, polyphonic sound events have discontinuity and unstable time-frequency variations. Traditional single acoustic features cannot characterize the key feature information of the polyphonic sound event, and this deficiency results in poor model classification performance. In this paper, we propose a convolutional recurrent neural network model based on the temporal-frequency (TF) attention mechanism and feature space (FS) attention mechanism (TFFS-CRNN). The TFFS-CRNN model aggregates Log-Mel spectrograms and MFCCs feature as inputs, which contains the TF-attention module, the convolutional recurrent neural network (CRNN) module, the FS-attention module and the bidirectional gated recurrent unit (BGRU) module. In polyphonic sound events detection (SED), the TF-attention module can capture the critical temporal–frequency features more capably. The FS-attention module assigns different dynamically learnable weights to different dimensions of features. The TFFS-CRNN model improves the characterization of features for key feature information in polyphonic SED. By using two attention modules, the model can focus on semantically relevant time frames, key frequency bands, and important feature spaces. Finally, the BGRU module learns contextual information. The experiments were conducted on the DCASE 2016 Task3 dataset and the DCASE 2017 Task3 dataset. Experimental results show that the F1-score of the TFFS-CRNN model improved 12.4% and 25.2% compared with winning system models in DCASE challenge; the ER is reduced by 0.41 and 0.37 as well. The proposed TFFS-CRNN model algorithm has better classification performance and lower ER in polyphonic SED.
format	Online Article Text
id	pubmed-9503981
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-95039812022-09-24 Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention Jin, Ye Wang, Mei Luo, Liyan Zhao, Dinghao Liu, Zhanqi Sensors (Basel) Article The complexity of polyphonic sounds imposes numerous challenges on their classification. Especially in real life, polyphonic sound events have discontinuity and unstable time-frequency variations. Traditional single acoustic features cannot characterize the key feature information of the polyphonic sound event, and this deficiency results in poor model classification performance. In this paper, we propose a convolutional recurrent neural network model based on the temporal-frequency (TF) attention mechanism and feature space (FS) attention mechanism (TFFS-CRNN). The TFFS-CRNN model aggregates Log-Mel spectrograms and MFCCs feature as inputs, which contains the TF-attention module, the convolutional recurrent neural network (CRNN) module, the FS-attention module and the bidirectional gated recurrent unit (BGRU) module. In polyphonic sound events detection (SED), the TF-attention module can capture the critical temporal–frequency features more capably. The FS-attention module assigns different dynamically learnable weights to different dimensions of features. The TFFS-CRNN model improves the characterization of features for key feature information in polyphonic SED. By using two attention modules, the model can focus on semantically relevant time frames, key frequency bands, and important feature spaces. Finally, the BGRU module learns contextual information. The experiments were conducted on the DCASE 2016 Task3 dataset and the DCASE 2017 Task3 dataset. Experimental results show that the F1-score of the TFFS-CRNN model improved 12.4% and 25.2% compared with winning system models in DCASE challenge; the ER is reduced by 0.41 and 0.37 as well. The proposed TFFS-CRNN model algorithm has better classification performance and lower ER in polyphonic SED. MDPI 2022-09-09 /pmc/articles/PMC9503981/ /pubmed/36146166 http://dx.doi.org/10.3390/s22186818 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Jin, Ye Wang, Mei Luo, Liyan Zhao, Dinghao Liu, Zhanqi Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention
title	Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention
title_full	Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention
title_fullStr	Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention
title_full_unstemmed	Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention
title_short	Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention
title_sort	polyphonic sound event detection using temporal-frequency attention and feature space attention
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9503981/ https://www.ncbi.nlm.nih.gov/pubmed/36146166 http://dx.doi.org/10.3390/s22186818
work_keys_str_mv	AT jinye polyphonicsoundeventdetectionusingtemporalfrequencyattentionandfeaturespaceattention AT wangmei polyphonicsoundeventdetectionusingtemporalfrequencyattentionandfeaturespaceattention AT luoliyan polyphonicsoundeventdetectionusingtemporalfrequencyattentionandfeaturespaceattention AT zhaodinghao polyphonicsoundeventdetectionusingtemporalfrequencyattentionandfeaturespaceattention AT liuzhanqi polyphonicsoundeventdetectionusingtemporalfrequencyattentionandfeaturespaceattention

Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention

Ejemplares similares