Cargando…

Speech emotion recognition using machine learning techniques: Feature extraction and comparison of convolutional neural network and random forest

Speech is a direct and rich way of transmitting information and emotions from one point to another. In this study, we aimed to classify different emotions in speech using various audio features and machine learning models. We extracted various types of audio features such as Mel-frequency cepstral c...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rezapour Mashhadi, Mohammad Mahdi, Osei-Bonsu, Kofi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10662716/ https://www.ncbi.nlm.nih.gov/pubmed/37988352 http://dx.doi.org/10.1371/journal.pone.0291500

_version_	1785148591879225344
author	Rezapour Mashhadi, Mohammad Mahdi Osei-Bonsu, Kofi
author_facet	Rezapour Mashhadi, Mohammad Mahdi Osei-Bonsu, Kofi
author_sort	Rezapour Mashhadi, Mohammad Mahdi
collection	PubMed
description	Speech is a direct and rich way of transmitting information and emotions from one point to another. In this study, we aimed to classify different emotions in speech using various audio features and machine learning models. We extracted various types of audio features such as Mel-frequency cepstral coefficients, chromogram, Mel-scale spectrogram, spectral contrast feature, Tonnetz representation and zero-crossing rate. We used a limited dataset of speech emotion recognition (SER) and augmented it with additional audios. In addition, In contrast to many previous studies, we combined all audio files together before conducting our analysis. We compared the performance of two models: one-dimensional convolutional neural network (conv1D) and random forest (RF), with RF-based feature selection. Our results showed that RF with feature selection achieved higher average accuracy (69%) than conv1D and had the highest precision for fear (72%) and the highest recall for calm (84%). Our study demonstrates the effectiveness of RF with feature selection for speech emotion classification using a limited dataset. We found for both algorithms, anger is misclassified mostly with happy, disgust with sad and neutral, and fear with sad. This could be due to the similarity of some acoustic features between these emotions, such as pitch, intensity, and tempo.
format	Online Article Text
id	pubmed-10662716
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-106627162023-11-21 Speech emotion recognition using machine learning techniques: Feature extraction and comparison of convolutional neural network and random forest Rezapour Mashhadi, Mohammad Mahdi Osei-Bonsu, Kofi PLoS One Research Article Speech is a direct and rich way of transmitting information and emotions from one point to another. In this study, we aimed to classify different emotions in speech using various audio features and machine learning models. We extracted various types of audio features such as Mel-frequency cepstral coefficients, chromogram, Mel-scale spectrogram, spectral contrast feature, Tonnetz representation and zero-crossing rate. We used a limited dataset of speech emotion recognition (SER) and augmented it with additional audios. In addition, In contrast to many previous studies, we combined all audio files together before conducting our analysis. We compared the performance of two models: one-dimensional convolutional neural network (conv1D) and random forest (RF), with RF-based feature selection. Our results showed that RF with feature selection achieved higher average accuracy (69%) than conv1D and had the highest precision for fear (72%) and the highest recall for calm (84%). Our study demonstrates the effectiveness of RF with feature selection for speech emotion classification using a limited dataset. We found for both algorithms, anger is misclassified mostly with happy, disgust with sad and neutral, and fear with sad. This could be due to the similarity of some acoustic features between these emotions, such as pitch, intensity, and tempo. Public Library of Science 2023-11-21 /pmc/articles/PMC10662716/ /pubmed/37988352 http://dx.doi.org/10.1371/journal.pone.0291500 Text en © 2023 Rezapour Mashhadi, Osei-Bonsu https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Rezapour Mashhadi, Mohammad Mahdi Osei-Bonsu, Kofi Speech emotion recognition using machine learning techniques: Feature extraction and comparison of convolutional neural network and random forest
title	Speech emotion recognition using machine learning techniques: Feature extraction and comparison of convolutional neural network and random forest
title_full	Speech emotion recognition using machine learning techniques: Feature extraction and comparison of convolutional neural network and random forest
title_fullStr	Speech emotion recognition using machine learning techniques: Feature extraction and comparison of convolutional neural network and random forest
title_full_unstemmed	Speech emotion recognition using machine learning techniques: Feature extraction and comparison of convolutional neural network and random forest
title_short	Speech emotion recognition using machine learning techniques: Feature extraction and comparison of convolutional neural network and random forest
title_sort	speech emotion recognition using machine learning techniques: feature extraction and comparison of convolutional neural network and random forest
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10662716/ https://www.ncbi.nlm.nih.gov/pubmed/37988352 http://dx.doi.org/10.1371/journal.pone.0291500
work_keys_str_mv	AT rezapourmashhadimohammadmahdi speechemotionrecognitionusingmachinelearningtechniquesfeatureextractionandcomparisonofconvolutionalneuralnetworkandrandomforest AT oseibonsukofi speechemotionrecognitionusingmachinelearningtechniquesfeatureextractionandcomparisonofconvolutionalneuralnetworkandrandomforest

Speech emotion recognition using machine learning techniques: Feature extraction and comparison of convolutional neural network and random forest

Ejemplares similares