Cargando…

High Accurate Environmental Sound Classification: Sub-Spectrogram Segmentation versus Temporal-Frequency Attention Mechanism

In the important and challenging field of environmental sound classification (ESC), a crucial and even decisive factor is the feature representation ability, which can directly affect the accuracy of classification. Therefore, the classification performance often depends to a large extent on whether...

Descripción completa

Detalles Bibliográficos
Autores principales: Qiao, Tianhao, Zhang, Shunqing, Cao, Shan, Xu, Shugong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8400609/
https://www.ncbi.nlm.nih.gov/pubmed/34450942
http://dx.doi.org/10.3390/s21165500
_version_ 1783745355224973312
author Qiao, Tianhao
Zhang, Shunqing
Cao, Shan
Xu, Shugong
author_facet Qiao, Tianhao
Zhang, Shunqing
Cao, Shan
Xu, Shugong
author_sort Qiao, Tianhao
collection PubMed
description In the important and challenging field of environmental sound classification (ESC), a crucial and even decisive factor is the feature representation ability, which can directly affect the accuracy of classification. Therefore, the classification performance often depends to a large extent on whether the effective representative features can be extracted from the environmental sound. In this paper, we firstly propose a sub-spectrogram segmentation with score level fusion based ESC classification framework, and we adopt the proposed convolutional recurrent neural network (CRNN) for improving the classification accuracy. By evaluating numerous truncation schemes, we numerically figure out the optimal number of sub-spectrograms and the corresponding band ranges, and, on this basis, we propose a joint attention mechanism with temporal and frequency attention mechanisms and use the global attention mechanism when generating the attention map. Finally, the numerical results show that the two frameworks we proposed can achieve 82.1% and 86.4% classification accuracy on the public environmental sound dataset ESC-50, respectively, which is equivalent to more than 13.5% improvement over the traditional baseline scheme.
format Online
Article
Text
id pubmed-8400609
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-84006092021-08-29 High Accurate Environmental Sound Classification: Sub-Spectrogram Segmentation versus Temporal-Frequency Attention Mechanism Qiao, Tianhao Zhang, Shunqing Cao, Shan Xu, Shugong Sensors (Basel) Article In the important and challenging field of environmental sound classification (ESC), a crucial and even decisive factor is the feature representation ability, which can directly affect the accuracy of classification. Therefore, the classification performance often depends to a large extent on whether the effective representative features can be extracted from the environmental sound. In this paper, we firstly propose a sub-spectrogram segmentation with score level fusion based ESC classification framework, and we adopt the proposed convolutional recurrent neural network (CRNN) for improving the classification accuracy. By evaluating numerous truncation schemes, we numerically figure out the optimal number of sub-spectrograms and the corresponding band ranges, and, on this basis, we propose a joint attention mechanism with temporal and frequency attention mechanisms and use the global attention mechanism when generating the attention map. Finally, the numerical results show that the two frameworks we proposed can achieve 82.1% and 86.4% classification accuracy on the public environmental sound dataset ESC-50, respectively, which is equivalent to more than 13.5% improvement over the traditional baseline scheme. MDPI 2021-08-16 /pmc/articles/PMC8400609/ /pubmed/34450942 http://dx.doi.org/10.3390/s21165500 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Qiao, Tianhao
Zhang, Shunqing
Cao, Shan
Xu, Shugong
High Accurate Environmental Sound Classification: Sub-Spectrogram Segmentation versus Temporal-Frequency Attention Mechanism
title High Accurate Environmental Sound Classification: Sub-Spectrogram Segmentation versus Temporal-Frequency Attention Mechanism
title_full High Accurate Environmental Sound Classification: Sub-Spectrogram Segmentation versus Temporal-Frequency Attention Mechanism
title_fullStr High Accurate Environmental Sound Classification: Sub-Spectrogram Segmentation versus Temporal-Frequency Attention Mechanism
title_full_unstemmed High Accurate Environmental Sound Classification: Sub-Spectrogram Segmentation versus Temporal-Frequency Attention Mechanism
title_short High Accurate Environmental Sound Classification: Sub-Spectrogram Segmentation versus Temporal-Frequency Attention Mechanism
title_sort high accurate environmental sound classification: sub-spectrogram segmentation versus temporal-frequency attention mechanism
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8400609/
https://www.ncbi.nlm.nih.gov/pubmed/34450942
http://dx.doi.org/10.3390/s21165500
work_keys_str_mv AT qiaotianhao highaccurateenvironmentalsoundclassificationsubspectrogramsegmentationversustemporalfrequencyattentionmechanism
AT zhangshunqing highaccurateenvironmentalsoundclassificationsubspectrogramsegmentationversustemporalfrequencyattentionmechanism
AT caoshan highaccurateenvironmentalsoundclassificationsubspectrogramsegmentationversustemporalfrequencyattentionmechanism
AT xushugong highaccurateenvironmentalsoundclassificationsubspectrogramsegmentationversustemporalfrequencyattentionmechanism