Cargando…

A CNN Sound Classification Mechanism Using Data Augmentation

Sound classification has been widely used in many fields. Unlike traditional signal-processing methods, using deep learning technology for sound classification is one of the most feasible and effective methods. However, limited by the quality of the training dataset, such as cost and resource constr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chu, Hung-Chi, Zhang, Young-Lin, Chiang, Hao-Chu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10422379/ https://www.ncbi.nlm.nih.gov/pubmed/37571755 http://dx.doi.org/10.3390/s23156972

_version_	1785089195128127488
author	Chu, Hung-Chi Zhang, Young-Lin Chiang, Hao-Chu
author_facet	Chu, Hung-Chi Zhang, Young-Lin Chiang, Hao-Chu
author_sort	Chu, Hung-Chi
collection	PubMed
description	Sound classification has been widely used in many fields. Unlike traditional signal-processing methods, using deep learning technology for sound classification is one of the most feasible and effective methods. However, limited by the quality of the training dataset, such as cost and resource constraints, data imbalance, and data annotation issues, the classification performance is affected. Therefore, we propose a sound classification mechanism based on convolutional neural networks and use the sound feature extraction method of Mel-Frequency Cepstral Coefficients (MFCCs) to convert sound signals into spectrograms. Spectrograms are suitable as input for CNN models. To provide the function of data augmentation, we can increase the number of spectrograms by setting the number of triangular bandpass filters. The experimental results show that there are 50 semantic categories in the ESC-50 dataset, the types are complex, and the amount of data is insufficient, resulting in a classification accuracy of only 63%. When using the proposed data augmentation method (K = 5), the accuracy is effectively increased to 97%. Furthermore, in the UrbanSound8K dataset, the amount of data is sufficient, so the classification accuracy can reach 90%, and the classification accuracy can be slightly increased to 92% via data augmentation. However, when only 50% of the training dataset is used, along with data augmentation, the establishment of the training model can be accelerated, and the classification accuracy can reach 91%.
format	Online Article Text
id	pubmed-10422379
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-104223792023-08-13 A CNN Sound Classification Mechanism Using Data Augmentation Chu, Hung-Chi Zhang, Young-Lin Chiang, Hao-Chu Sensors (Basel) Article Sound classification has been widely used in many fields. Unlike traditional signal-processing methods, using deep learning technology for sound classification is one of the most feasible and effective methods. However, limited by the quality of the training dataset, such as cost and resource constraints, data imbalance, and data annotation issues, the classification performance is affected. Therefore, we propose a sound classification mechanism based on convolutional neural networks and use the sound feature extraction method of Mel-Frequency Cepstral Coefficients (MFCCs) to convert sound signals into spectrograms. Spectrograms are suitable as input for CNN models. To provide the function of data augmentation, we can increase the number of spectrograms by setting the number of triangular bandpass filters. The experimental results show that there are 50 semantic categories in the ESC-50 dataset, the types are complex, and the amount of data is insufficient, resulting in a classification accuracy of only 63%. When using the proposed data augmentation method (K = 5), the accuracy is effectively increased to 97%. Furthermore, in the UrbanSound8K dataset, the amount of data is sufficient, so the classification accuracy can reach 90%, and the classification accuracy can be slightly increased to 92% via data augmentation. However, when only 50% of the training dataset is used, along with data augmentation, the establishment of the training model can be accelerated, and the classification accuracy can reach 91%. MDPI 2023-08-05 /pmc/articles/PMC10422379/ /pubmed/37571755 http://dx.doi.org/10.3390/s23156972 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Chu, Hung-Chi Zhang, Young-Lin Chiang, Hao-Chu A CNN Sound Classification Mechanism Using Data Augmentation
title	A CNN Sound Classification Mechanism Using Data Augmentation
title_full	A CNN Sound Classification Mechanism Using Data Augmentation
title_fullStr	A CNN Sound Classification Mechanism Using Data Augmentation
title_full_unstemmed	A CNN Sound Classification Mechanism Using Data Augmentation
title_short	A CNN Sound Classification Mechanism Using Data Augmentation
title_sort	cnn sound classification mechanism using data augmentation
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10422379/ https://www.ncbi.nlm.nih.gov/pubmed/37571755 http://dx.doi.org/10.3390/s23156972
work_keys_str_mv	AT chuhungchi acnnsoundclassificationmechanismusingdataaugmentation AT zhangyounglin acnnsoundclassificationmechanismusingdataaugmentation AT chianghaochu acnnsoundclassificationmechanismusingdataaugmentation AT chuhungchi cnnsoundclassificationmechanismusingdataaugmentation AT zhangyounglin cnnsoundclassificationmechanismusingdataaugmentation AT chianghaochu cnnsoundclassificationmechanismusingdataaugmentation

A CNN Sound Classification Mechanism Using Data Augmentation

Ejemplares similares