Cargando…
Environmental sound classification using temporal-frequency attention based convolutional neural network
Environmental sound classification is one of the important issues in the audio recognition field. Compared with structured sounds such as speech and music, the time–frequency structure of environmental sounds is more complicated. In order to learn time and frequency features from Log-Mel spectrogram...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8566500/ https://www.ncbi.nlm.nih.gov/pubmed/34732762 http://dx.doi.org/10.1038/s41598-021-01045-4 |
_version_ | 1784594026624712704 |
---|---|
author | Mu, Wenjie Yin, Bo Huang, Xianqing Xu, Jiali Du, Zehua |
author_facet | Mu, Wenjie Yin, Bo Huang, Xianqing Xu, Jiali Du, Zehua |
author_sort | Mu, Wenjie |
collection | PubMed |
description | Environmental sound classification is one of the important issues in the audio recognition field. Compared with structured sounds such as speech and music, the time–frequency structure of environmental sounds is more complicated. In order to learn time and frequency features from Log-Mel spectrogram more effectively, a temporal-frequency attention based convolutional neural network model (TFCNN) is proposed in this paper. Firstly, an experiment that is used as motivation in proposed method is designed to verify the effect of a specific frequency band in the spectrogram on model classification. Secondly, two new attention mechanisms, temporal attention mechanism and frequency attention mechanism, are proposed. These mechanisms can focus on key frequency bands and semantic related time frames on the spectrogram to reduce the influence of background noise and irrelevant frequency bands. Then, a feature information complementarity is formed by combining these mechanisms to more accurately capture the critical time–frequency features. In such a way, the representation ability of the network model can be greatly improved. Finally, experiments on two public data sets, UrbanSound 8 K and ESC-50, demonstrate the effectiveness of the proposed method. |
format | Online Article Text |
id | pubmed-8566500 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-85665002021-11-04 Environmental sound classification using temporal-frequency attention based convolutional neural network Mu, Wenjie Yin, Bo Huang, Xianqing Xu, Jiali Du, Zehua Sci Rep Article Environmental sound classification is one of the important issues in the audio recognition field. Compared with structured sounds such as speech and music, the time–frequency structure of environmental sounds is more complicated. In order to learn time and frequency features from Log-Mel spectrogram more effectively, a temporal-frequency attention based convolutional neural network model (TFCNN) is proposed in this paper. Firstly, an experiment that is used as motivation in proposed method is designed to verify the effect of a specific frequency band in the spectrogram on model classification. Secondly, two new attention mechanisms, temporal attention mechanism and frequency attention mechanism, are proposed. These mechanisms can focus on key frequency bands and semantic related time frames on the spectrogram to reduce the influence of background noise and irrelevant frequency bands. Then, a feature information complementarity is formed by combining these mechanisms to more accurately capture the critical time–frequency features. In such a way, the representation ability of the network model can be greatly improved. Finally, experiments on two public data sets, UrbanSound 8 K and ESC-50, demonstrate the effectiveness of the proposed method. Nature Publishing Group UK 2021-11-03 /pmc/articles/PMC8566500/ /pubmed/34732762 http://dx.doi.org/10.1038/s41598-021-01045-4 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Mu, Wenjie Yin, Bo Huang, Xianqing Xu, Jiali Du, Zehua Environmental sound classification using temporal-frequency attention based convolutional neural network |
title | Environmental sound classification using temporal-frequency attention based convolutional neural network |
title_full | Environmental sound classification using temporal-frequency attention based convolutional neural network |
title_fullStr | Environmental sound classification using temporal-frequency attention based convolutional neural network |
title_full_unstemmed | Environmental sound classification using temporal-frequency attention based convolutional neural network |
title_short | Environmental sound classification using temporal-frequency attention based convolutional neural network |
title_sort | environmental sound classification using temporal-frequency attention based convolutional neural network |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8566500/ https://www.ncbi.nlm.nih.gov/pubmed/34732762 http://dx.doi.org/10.1038/s41598-021-01045-4 |
work_keys_str_mv | AT muwenjie environmentalsoundclassificationusingtemporalfrequencyattentionbasedconvolutionalneuralnetwork AT yinbo environmentalsoundclassificationusingtemporalfrequencyattentionbasedconvolutionalneuralnetwork AT huangxianqing environmentalsoundclassificationusingtemporalfrequencyattentionbasedconvolutionalneuralnetwork AT xujiali environmentalsoundclassificationusingtemporalfrequencyattentionbasedconvolutionalneuralnetwork AT duzehua environmentalsoundclassificationusingtemporalfrequencyattentionbasedconvolutionalneuralnetwork |