Cargando…

Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion

With the popularity of using deep learning-based models in various categorization problems and their proven robustness compared to conventional methods, a growing number of researchers have exploited such methods in environment sound classification tasks in recent years. However, the performances of...

Descripción completa

Detalles Bibliográficos
Autores principales:	Su, Yu, Zhang, Ke, Wang, Jingyu, Madani, Kurosh
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2019
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6479959/ https://www.ncbi.nlm.nih.gov/pubmed/30978974 http://dx.doi.org/10.3390/s19071733

_version_	1783413465623298048
author	Su, Yu Zhang, Ke Wang, Jingyu Madani, Kurosh
author_facet	Su, Yu Zhang, Ke Wang, Jingyu Madani, Kurosh
author_sort	Su, Yu
collection	PubMed
description	With the popularity of using deep learning-based models in various categorization problems and their proven robustness compared to conventional methods, a growing number of researchers have exploited such methods in environment sound classification tasks in recent years. However, the performances of existing models use auditory features like log-mel spectrogram (LM) and mel frequency cepstral coefficient (MFCC), or raw waveform to train deep neural networks for environment sound classification (ESC) are unsatisfactory. In this paper, we first propose two combined features to give a more comprehensive representation of environment sounds Then, a fourfour-layer convolutional neural network (CNN) is presented to improve the performance of ESC with the proposed aggregated features. Finally, the CNN trained with different features are fused using the Dempster–Shafer evidence theory to compose TSCNN-DS model. The experiment results indicate that our combined features with the four-layer CNN are appropriate for environment sound taxonomic problems and dramatically outperform other conventional methods. The proposed TSCNN-DS model achieves a classification accuracy of 97.2%, which is the highest taxonomic accuracy on UrbanSound8K datasets compared to existing models.
format	Online Article Text
id	pubmed-6479959
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-64799592019-04-29 Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion Su, Yu Zhang, Ke Wang, Jingyu Madani, Kurosh Sensors (Basel) Article With the popularity of using deep learning-based models in various categorization problems and their proven robustness compared to conventional methods, a growing number of researchers have exploited such methods in environment sound classification tasks in recent years. However, the performances of existing models use auditory features like log-mel spectrogram (LM) and mel frequency cepstral coefficient (MFCC), or raw waveform to train deep neural networks for environment sound classification (ESC) are unsatisfactory. In this paper, we first propose two combined features to give a more comprehensive representation of environment sounds Then, a fourfour-layer convolutional neural network (CNN) is presented to improve the performance of ESC with the proposed aggregated features. Finally, the CNN trained with different features are fused using the Dempster–Shafer evidence theory to compose TSCNN-DS model. The experiment results indicate that our combined features with the four-layer CNN are appropriate for environment sound taxonomic problems and dramatically outperform other conventional methods. The proposed TSCNN-DS model achieves a classification accuracy of 97.2%, which is the highest taxonomic accuracy on UrbanSound8K datasets compared to existing models. MDPI 2019-04-11 /pmc/articles/PMC6479959/ /pubmed/30978974 http://dx.doi.org/10.3390/s19071733 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Su, Yu Zhang, Ke Wang, Jingyu Madani, Kurosh Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion
title	Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion
title_full	Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion
title_fullStr	Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion
title_full_unstemmed	Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion
title_short	Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion
title_sort	environment sound classification using a two-stream cnn based on decision-level fusion
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6479959/ https://www.ncbi.nlm.nih.gov/pubmed/30978974 http://dx.doi.org/10.3390/s19071733
work_keys_str_mv	AT suyu environmentsoundclassificationusingatwostreamcnnbasedondecisionlevelfusion AT zhangke environmentsoundclassificationusingatwostreamcnnbasedondecisionlevelfusion AT wangjingyu environmentsoundclassificationusingatwostreamcnnbasedondecisionlevelfusion AT madanikurosh environmentsoundclassificationusingatwostreamcnnbasedondecisionlevelfusion

Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion

Ejemplares similares