Cargando…

Attention Based Convolutional Neural Network with Multi-frequency Resolution Feature for Environment Sound Classification

The environmental sound classification has great research significance in the fields of intelligent audio monitoring and other fields. A novel multi-frequency resolution (MFR) feature is proposed in this paper to solve the problem that the existing single frequency resolution time–frequency features...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Minze, Huang, Wu, Zhang, Tao
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer US 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9589621/ https://www.ncbi.nlm.nih.gov/pubmed/36312843 http://dx.doi.org/10.1007/s11063-022-11041-y

_version_	1784814344124497920
author	Li, Minze Huang, Wu Zhang, Tao
author_facet	Li, Minze Huang, Wu Zhang, Tao
author_sort	Li, Minze
collection	PubMed
description	The environmental sound classification has great research significance in the fields of intelligent audio monitoring and other fields. A novel multi-frequency resolution (MFR) feature is proposed in this paper to solve the problem that the existing single frequency resolution time–frequency features of sound cannot effectively express the characteristics of multiple types of sound. The MFR feature is composed of three features with different frequency resolutions, which are compressed in varying degrees at the time dimension. This method not only has the effect of data augmentation but also can obtain more context information during the feature extraction. And the MFR features of Log-Mel Spectrogram, Cochleagram, and Constant Q-Transform are combined to form a multi-channel MFR feature. Also, a network named SacNet is built, which can effectively solve the problem that the time–frequency feature map of sound contains more invalid information. The basic structural unit of the SacNet consists of two parallel branches, one using depthwise separable convolution as the main feature extractor, and the other using spatial attention module to extract more effective information. Experiment results have demonstrated that the proposed method achieves the state-of-the-art accuracy of 97.5%, 93.1%, and 95.3% on three benchmark datasets of ESC10, ESC50, and UrbanSound8K respectively, which are increased by 3.3%, 0.5%, and 2.3% respectively compared with the previous advanced methods.
format	Online Article Text
id	pubmed-9589621
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Springer US
record_format	MEDLINE/PubMed
spelling	pubmed-95896212022-10-24 Attention Based Convolutional Neural Network with Multi-frequency Resolution Feature for Environment Sound Classification Li, Minze Huang, Wu Zhang, Tao Neural Process Lett Article The environmental sound classification has great research significance in the fields of intelligent audio monitoring and other fields. A novel multi-frequency resolution (MFR) feature is proposed in this paper to solve the problem that the existing single frequency resolution time–frequency features of sound cannot effectively express the characteristics of multiple types of sound. The MFR feature is composed of three features with different frequency resolutions, which are compressed in varying degrees at the time dimension. This method not only has the effect of data augmentation but also can obtain more context information during the feature extraction. And the MFR features of Log-Mel Spectrogram, Cochleagram, and Constant Q-Transform are combined to form a multi-channel MFR feature. Also, a network named SacNet is built, which can effectively solve the problem that the time–frequency feature map of sound contains more invalid information. The basic structural unit of the SacNet consists of two parallel branches, one using depthwise separable convolution as the main feature extractor, and the other using spatial attention module to extract more effective information. Experiment results have demonstrated that the proposed method achieves the state-of-the-art accuracy of 97.5%, 93.1%, and 95.3% on three benchmark datasets of ESC10, ESC50, and UrbanSound8K respectively, which are increased by 3.3%, 0.5%, and 2.3% respectively compared with the previous advanced methods. Springer US 2022-10-24 /pmc/articles/PMC9589621/ /pubmed/36312843 http://dx.doi.org/10.1007/s11063-022-11041-y Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Li, Minze Huang, Wu Zhang, Tao Attention Based Convolutional Neural Network with Multi-frequency Resolution Feature for Environment Sound Classification
title	Attention Based Convolutional Neural Network with Multi-frequency Resolution Feature for Environment Sound Classification
title_full	Attention Based Convolutional Neural Network with Multi-frequency Resolution Feature for Environment Sound Classification
title_fullStr	Attention Based Convolutional Neural Network with Multi-frequency Resolution Feature for Environment Sound Classification
title_full_unstemmed	Attention Based Convolutional Neural Network with Multi-frequency Resolution Feature for Environment Sound Classification
title_short	Attention Based Convolutional Neural Network with Multi-frequency Resolution Feature for Environment Sound Classification
title_sort	attention based convolutional neural network with multi-frequency resolution feature for environment sound classification
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9589621/ https://www.ncbi.nlm.nih.gov/pubmed/36312843 http://dx.doi.org/10.1007/s11063-022-11041-y
work_keys_str_mv	AT liminze attentionbasedconvolutionalneuralnetworkwithmultifrequencyresolutionfeatureforenvironmentsoundclassification AT huangwu attentionbasedconvolutionalneuralnetworkwithmultifrequencyresolutionfeatureforenvironmentsoundclassification AT zhangtao attentionbasedconvolutionalneuralnetworkwithmultifrequencyresolutionfeatureforenvironmentsoundclassification

Attention Based Convolutional Neural Network with Multi-frequency Resolution Feature for Environment Sound Classification

Ejemplares similares