Cargando…
Acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks
As an effective approach to perceive environments, acoustic scene classification (ASC) has received considerable attention in the past few years. Generally, ASC is deemed a challenging task due to subtle differences between various classes of environmental sounds. In this paper, we propose a novel a...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9374676/ https://www.ncbi.nlm.nih.gov/pubmed/35962021 http://dx.doi.org/10.1038/s41598-022-17863-z |
_version_ | 1784767838524801024 |
---|---|
author | Qu, Yuanyuan Li, Xuesheng Qin, Zhiliang Lu, Qidong |
author_facet | Qu, Yuanyuan Li, Xuesheng Qin, Zhiliang Lu, Qidong |
author_sort | Qu, Yuanyuan |
collection | PubMed |
description | As an effective approach to perceive environments, acoustic scene classification (ASC) has received considerable attention in the past few years. Generally, ASC is deemed a challenging task due to subtle differences between various classes of environmental sounds. In this paper, we propose a novel approach to perform accurate classification based on the aggregation of spatial–temporal features extracted from a multi-branch three-dimensional (3D) convolution neural network (CNN) model. The novelties of this paper are as follows. First, we form multiple frequency-domain representations of signals by fully utilizing expert knowledge on acoustics and discrete wavelet transformations (DWT). Secondly, we propose a novel 3D CNN architecture featuring residual connections and squeeze-and-excitation attentions (3D-SE-ResNet) to effectively capture both long-term and short-term correlations inherent in environmental sounds. Thirdly, an auxiliary supervised branch based on the chromatogram of the original signal is incorporated in the proposed architecture to alleviate overfitting risks by providing supplementary information to the model. The performance of the proposed multi-input multi-feature 3D-CNN architecture is numerically evaluated on a typical large-scale dataset in the 2019 IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE 2019) and is shown to obtain noticeable performance gains over the state-of-the-art methods in the literature. |
format | Online Article Text |
id | pubmed-9374676 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-93746762022-08-14 Acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks Qu, Yuanyuan Li, Xuesheng Qin, Zhiliang Lu, Qidong Sci Rep Article As an effective approach to perceive environments, acoustic scene classification (ASC) has received considerable attention in the past few years. Generally, ASC is deemed a challenging task due to subtle differences between various classes of environmental sounds. In this paper, we propose a novel approach to perform accurate classification based on the aggregation of spatial–temporal features extracted from a multi-branch three-dimensional (3D) convolution neural network (CNN) model. The novelties of this paper are as follows. First, we form multiple frequency-domain representations of signals by fully utilizing expert knowledge on acoustics and discrete wavelet transformations (DWT). Secondly, we propose a novel 3D CNN architecture featuring residual connections and squeeze-and-excitation attentions (3D-SE-ResNet) to effectively capture both long-term and short-term correlations inherent in environmental sounds. Thirdly, an auxiliary supervised branch based on the chromatogram of the original signal is incorporated in the proposed architecture to alleviate overfitting risks by providing supplementary information to the model. The performance of the proposed multi-input multi-feature 3D-CNN architecture is numerically evaluated on a typical large-scale dataset in the 2019 IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE 2019) and is shown to obtain noticeable performance gains over the state-of-the-art methods in the literature. Nature Publishing Group UK 2022-08-12 /pmc/articles/PMC9374676/ /pubmed/35962021 http://dx.doi.org/10.1038/s41598-022-17863-z Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Qu, Yuanyuan Li, Xuesheng Qin, Zhiliang Lu, Qidong Acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks |
title | Acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks |
title_full | Acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks |
title_fullStr | Acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks |
title_full_unstemmed | Acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks |
title_short | Acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks |
title_sort | acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9374676/ https://www.ncbi.nlm.nih.gov/pubmed/35962021 http://dx.doi.org/10.1038/s41598-022-17863-z |
work_keys_str_mv | AT quyuanyuan acousticsceneclassificationbasedonthreedimensionalmultichannelfeaturecorrelateddeeplearningnetworks AT lixuesheng acousticsceneclassificationbasedonthreedimensionalmultichannelfeaturecorrelateddeeplearningnetworks AT qinzhiliang acousticsceneclassificationbasedonthreedimensionalmultichannelfeaturecorrelateddeeplearningnetworks AT luqidong acousticsceneclassificationbasedonthreedimensionalmultichannelfeaturecorrelateddeeplearningnetworks |