Cargando…

Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition

Speech emotion recognition (SER) systems have evolved into an important method for recognizing a person in several applications, including e-commerce, everyday interactions, law enforcement, and forensics. The SER system’s efficiency depends on the length of the audio samples used for testing and tr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Amjad, Ammar, Khan, Lal, Chang, Hsien-Tsung
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2022
Materias:	Artificial Intelligence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9454772/ https://www.ncbi.nlm.nih.gov/pubmed/36091976 http://dx.doi.org/10.7717/peerj-cs.1053

_version_	1784785430041853952
author	Amjad, Ammar Khan, Lal Chang, Hsien-Tsung
author_facet	Amjad, Ammar Khan, Lal Chang, Hsien-Tsung
author_sort	Amjad, Ammar
collection	PubMed
description	Speech emotion recognition (SER) systems have evolved into an important method for recognizing a person in several applications, including e-commerce, everyday interactions, law enforcement, and forensics. The SER system’s efficiency depends on the length of the audio samples used for testing and training. However, the different suggested models successfully obtained relatively high accuracy in this study. Moreover, the degree of SER efficiency is not yet optimum due to the limited database, resulting in overfitting and skewing samples. Therefore, the proposed approach presents a data augmentation method that shifts the pitch, uses multiple window sizes, stretches the time, and adds white noise to the original audio. In addition, a deep model is further evaluated to generate a new paradigm for SER. The data augmentation approach increased the limited amount of data from the Pakistani racial speaker speech dataset in the proposed system. The seven-layer framework was employed to provide the most optimal performance in terms of accuracy compared to other multilayer approaches. The seven-layer method is used in existing works to achieve a very high level of accuracy. The suggested system achieved 97.32% accuracy with a 0.032% loss in the 75%:25% splitting ratio. In addition, more than 500 augmentation data samples were added. Therefore, the proposed approach results show that deep neural networks with data augmentation can enhance the SER performance on the Pakistani racial speech dataset.
format	Online Article Text
id	pubmed-9454772
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-94547722022-09-09 Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition Amjad, Ammar Khan, Lal Chang, Hsien-Tsung PeerJ Comput Sci Artificial Intelligence Speech emotion recognition (SER) systems have evolved into an important method for recognizing a person in several applications, including e-commerce, everyday interactions, law enforcement, and forensics. The SER system’s efficiency depends on the length of the audio samples used for testing and training. However, the different suggested models successfully obtained relatively high accuracy in this study. Moreover, the degree of SER efficiency is not yet optimum due to the limited database, resulting in overfitting and skewing samples. Therefore, the proposed approach presents a data augmentation method that shifts the pitch, uses multiple window sizes, stretches the time, and adds white noise to the original audio. In addition, a deep model is further evaluated to generate a new paradigm for SER. The data augmentation approach increased the limited amount of data from the Pakistani racial speaker speech dataset in the proposed system. The seven-layer framework was employed to provide the most optimal performance in terms of accuracy compared to other multilayer approaches. The seven-layer method is used in existing works to achieve a very high level of accuracy. The suggested system achieved 97.32% accuracy with a 0.032% loss in the 75%:25% splitting ratio. In addition, more than 500 augmentation data samples were added. Therefore, the proposed approach results show that deep neural networks with data augmentation can enhance the SER performance on the Pakistani racial speech dataset. PeerJ Inc. 2022-08-03 /pmc/articles/PMC9454772/ /pubmed/36091976 http://dx.doi.org/10.7717/peerj-cs.1053 Text en © 2022 Amjad et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Artificial Intelligence Amjad, Ammar Khan, Lal Chang, Hsien-Tsung Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition
title	Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition
title_full	Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition
title_fullStr	Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition
title_full_unstemmed	Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition
title_short	Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition
title_sort	data augmentation and deep neural networks for the classification of pakistani racial speakers recognition
topic	Artificial Intelligence
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9454772/ https://www.ncbi.nlm.nih.gov/pubmed/36091976 http://dx.doi.org/10.7717/peerj-cs.1053
work_keys_str_mv	AT amjadammar dataaugmentationanddeepneuralnetworksfortheclassificationofpakistaniracialspeakersrecognition AT khanlal dataaugmentationanddeepneuralnetworksfortheclassificationofpakistaniracialspeakersrecognition AT changhsientsung dataaugmentationanddeepneuralnetworksfortheclassificationofpakistaniracialspeakersrecognition

Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition

Ejemplares similares