Cargando…

A CNN Sound Classification Mechanism Using Data Augmentation

Sound classification has been widely used in many fields. Unlike traditional signal-processing methods, using deep learning technology for sound classification is one of the most feasible and effective methods. However, limited by the quality of the training dataset, such as cost and resource constr...

Descripción completa

Detalles Bibliográficos
Autores principales: Chu, Hung-Chi, Zhang, Young-Lin, Chiang, Hao-Chu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10422379/
https://www.ncbi.nlm.nih.gov/pubmed/37571755
http://dx.doi.org/10.3390/s23156972
_version_ 1785089195128127488
author Chu, Hung-Chi
Zhang, Young-Lin
Chiang, Hao-Chu
author_facet Chu, Hung-Chi
Zhang, Young-Lin
Chiang, Hao-Chu
author_sort Chu, Hung-Chi
collection PubMed
description Sound classification has been widely used in many fields. Unlike traditional signal-processing methods, using deep learning technology for sound classification is one of the most feasible and effective methods. However, limited by the quality of the training dataset, such as cost and resource constraints, data imbalance, and data annotation issues, the classification performance is affected. Therefore, we propose a sound classification mechanism based on convolutional neural networks and use the sound feature extraction method of Mel-Frequency Cepstral Coefficients (MFCCs) to convert sound signals into spectrograms. Spectrograms are suitable as input for CNN models. To provide the function of data augmentation, we can increase the number of spectrograms by setting the number of triangular bandpass filters. The experimental results show that there are 50 semantic categories in the ESC-50 dataset, the types are complex, and the amount of data is insufficient, resulting in a classification accuracy of only 63%. When using the proposed data augmentation method (K = 5), the accuracy is effectively increased to 97%. Furthermore, in the UrbanSound8K dataset, the amount of data is sufficient, so the classification accuracy can reach 90%, and the classification accuracy can be slightly increased to 92% via data augmentation. However, when only 50% of the training dataset is used, along with data augmentation, the establishment of the training model can be accelerated, and the classification accuracy can reach 91%.
format Online
Article
Text
id pubmed-10422379
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-104223792023-08-13 A CNN Sound Classification Mechanism Using Data Augmentation Chu, Hung-Chi Zhang, Young-Lin Chiang, Hao-Chu Sensors (Basel) Article Sound classification has been widely used in many fields. Unlike traditional signal-processing methods, using deep learning technology for sound classification is one of the most feasible and effective methods. However, limited by the quality of the training dataset, such as cost and resource constraints, data imbalance, and data annotation issues, the classification performance is affected. Therefore, we propose a sound classification mechanism based on convolutional neural networks and use the sound feature extraction method of Mel-Frequency Cepstral Coefficients (MFCCs) to convert sound signals into spectrograms. Spectrograms are suitable as input for CNN models. To provide the function of data augmentation, we can increase the number of spectrograms by setting the number of triangular bandpass filters. The experimental results show that there are 50 semantic categories in the ESC-50 dataset, the types are complex, and the amount of data is insufficient, resulting in a classification accuracy of only 63%. When using the proposed data augmentation method (K = 5), the accuracy is effectively increased to 97%. Furthermore, in the UrbanSound8K dataset, the amount of data is sufficient, so the classification accuracy can reach 90%, and the classification accuracy can be slightly increased to 92% via data augmentation. However, when only 50% of the training dataset is used, along with data augmentation, the establishment of the training model can be accelerated, and the classification accuracy can reach 91%. MDPI 2023-08-05 /pmc/articles/PMC10422379/ /pubmed/37571755 http://dx.doi.org/10.3390/s23156972 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Chu, Hung-Chi
Zhang, Young-Lin
Chiang, Hao-Chu
A CNN Sound Classification Mechanism Using Data Augmentation
title A CNN Sound Classification Mechanism Using Data Augmentation
title_full A CNN Sound Classification Mechanism Using Data Augmentation
title_fullStr A CNN Sound Classification Mechanism Using Data Augmentation
title_full_unstemmed A CNN Sound Classification Mechanism Using Data Augmentation
title_short A CNN Sound Classification Mechanism Using Data Augmentation
title_sort cnn sound classification mechanism using data augmentation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10422379/
https://www.ncbi.nlm.nih.gov/pubmed/37571755
http://dx.doi.org/10.3390/s23156972
work_keys_str_mv AT chuhungchi acnnsoundclassificationmechanismusingdataaugmentation
AT zhangyounglin acnnsoundclassificationmechanismusingdataaugmentation
AT chianghaochu acnnsoundclassificationmechanismusingdataaugmentation
AT chuhungchi cnnsoundclassificationmechanismusingdataaugmentation
AT zhangyounglin cnnsoundclassificationmechanismusingdataaugmentation
AT chianghaochu cnnsoundclassificationmechanismusingdataaugmentation