Cargando…
Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention
The complexity of polyphonic sounds imposes numerous challenges on their classification. Especially in real life, polyphonic sound events have discontinuity and unstable time-frequency variations. Traditional single acoustic features cannot characterize the key feature information of the polyphonic...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9503981/ https://www.ncbi.nlm.nih.gov/pubmed/36146166 http://dx.doi.org/10.3390/s22186818 |
_version_ | 1784796101192187904 |
---|---|
author | Jin, Ye Wang, Mei Luo, Liyan Zhao, Dinghao Liu, Zhanqi |
author_facet | Jin, Ye Wang, Mei Luo, Liyan Zhao, Dinghao Liu, Zhanqi |
author_sort | Jin, Ye |
collection | PubMed |
description | The complexity of polyphonic sounds imposes numerous challenges on their classification. Especially in real life, polyphonic sound events have discontinuity and unstable time-frequency variations. Traditional single acoustic features cannot characterize the key feature information of the polyphonic sound event, and this deficiency results in poor model classification performance. In this paper, we propose a convolutional recurrent neural network model based on the temporal-frequency (TF) attention mechanism and feature space (FS) attention mechanism (TFFS-CRNN). The TFFS-CRNN model aggregates Log-Mel spectrograms and MFCCs feature as inputs, which contains the TF-attention module, the convolutional recurrent neural network (CRNN) module, the FS-attention module and the bidirectional gated recurrent unit (BGRU) module. In polyphonic sound events detection (SED), the TF-attention module can capture the critical temporal–frequency features more capably. The FS-attention module assigns different dynamically learnable weights to different dimensions of features. The TFFS-CRNN model improves the characterization of features for key feature information in polyphonic SED. By using two attention modules, the model can focus on semantically relevant time frames, key frequency bands, and important feature spaces. Finally, the BGRU module learns contextual information. The experiments were conducted on the DCASE 2016 Task3 dataset and the DCASE 2017 Task3 dataset. Experimental results show that the F1-score of the TFFS-CRNN model improved 12.4% and 25.2% compared with winning system models in DCASE challenge; the ER is reduced by 0.41 and 0.37 as well. The proposed TFFS-CRNN model algorithm has better classification performance and lower ER in polyphonic SED. |
format | Online Article Text |
id | pubmed-9503981 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-95039812022-09-24 Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention Jin, Ye Wang, Mei Luo, Liyan Zhao, Dinghao Liu, Zhanqi Sensors (Basel) Article The complexity of polyphonic sounds imposes numerous challenges on their classification. Especially in real life, polyphonic sound events have discontinuity and unstable time-frequency variations. Traditional single acoustic features cannot characterize the key feature information of the polyphonic sound event, and this deficiency results in poor model classification performance. In this paper, we propose a convolutional recurrent neural network model based on the temporal-frequency (TF) attention mechanism and feature space (FS) attention mechanism (TFFS-CRNN). The TFFS-CRNN model aggregates Log-Mel spectrograms and MFCCs feature as inputs, which contains the TF-attention module, the convolutional recurrent neural network (CRNN) module, the FS-attention module and the bidirectional gated recurrent unit (BGRU) module. In polyphonic sound events detection (SED), the TF-attention module can capture the critical temporal–frequency features more capably. The FS-attention module assigns different dynamically learnable weights to different dimensions of features. The TFFS-CRNN model improves the characterization of features for key feature information in polyphonic SED. By using two attention modules, the model can focus on semantically relevant time frames, key frequency bands, and important feature spaces. Finally, the BGRU module learns contextual information. The experiments were conducted on the DCASE 2016 Task3 dataset and the DCASE 2017 Task3 dataset. Experimental results show that the F1-score of the TFFS-CRNN model improved 12.4% and 25.2% compared with winning system models in DCASE challenge; the ER is reduced by 0.41 and 0.37 as well. The proposed TFFS-CRNN model algorithm has better classification performance and lower ER in polyphonic SED. MDPI 2022-09-09 /pmc/articles/PMC9503981/ /pubmed/36146166 http://dx.doi.org/10.3390/s22186818 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Jin, Ye Wang, Mei Luo, Liyan Zhao, Dinghao Liu, Zhanqi Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention |
title | Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention |
title_full | Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention |
title_fullStr | Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention |
title_full_unstemmed | Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention |
title_short | Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention |
title_sort | polyphonic sound event detection using temporal-frequency attention and feature space attention |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9503981/ https://www.ncbi.nlm.nih.gov/pubmed/36146166 http://dx.doi.org/10.3390/s22186818 |
work_keys_str_mv | AT jinye polyphonicsoundeventdetectionusingtemporalfrequencyattentionandfeaturespaceattention AT wangmei polyphonicsoundeventdetectionusingtemporalfrequencyattentionandfeaturespaceattention AT luoliyan polyphonicsoundeventdetectionusingtemporalfrequencyattentionandfeaturespaceattention AT zhaodinghao polyphonicsoundeventdetectionusingtemporalfrequencyattentionandfeaturespaceattention AT liuzhanqi polyphonicsoundeventdetectionusingtemporalfrequencyattentionandfeaturespaceattention |