Cargando…

Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders

Understanding and identifying emotional cues in human speech is a crucial aspect of human–computer communication. The application of computer technology in dissecting and deciphering emotions, along with the extraction of relevant emotional characteristics from speech, forms a significant part of th...

Descripción completa

Detalles Bibliográficos
Autores principales: Pulatov, Ilkhomjon, Oteniyazov, Rashid, Makhmudov, Fazliddin, Cho, Young-Im
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10383041/
https://www.ncbi.nlm.nih.gov/pubmed/37514933
http://dx.doi.org/10.3390/s23146640
_version_ 1785080809114304512
author Pulatov, Ilkhomjon
Oteniyazov, Rashid
Makhmudov, Fazliddin
Cho, Young-Im
author_facet Pulatov, Ilkhomjon
Oteniyazov, Rashid
Makhmudov, Fazliddin
Cho, Young-Im
author_sort Pulatov, Ilkhomjon
collection PubMed
description Understanding and identifying emotional cues in human speech is a crucial aspect of human–computer communication. The application of computer technology in dissecting and deciphering emotions, along with the extraction of relevant emotional characteristics from speech, forms a significant part of this process. The objective of this study was to architect an innovative framework for speech emotion recognition predicated on spectrograms and semantic feature transcribers, aiming to bolster performance precision by acknowledging the conspicuous inadequacies in extant methodologies and rectifying them. To procure invaluable attributes for speech detection, this investigation leveraged two divergent strategies. Primarily, a wholly convolutional neural network model was engaged to transcribe speech spectrograms. Subsequently, a cutting-edge Mel-frequency cepstral coefficient feature abstraction approach was adopted and integrated with Speech2Vec for semantic feature encoding. These dual forms of attributes underwent individual processing before they were channeled into a long short-term memory network and a comprehensive connected layer for supplementary representation. By doing so, we aimed to bolster the sophistication and efficacy of our speech emotion detection model, thereby enhancing its potential to accurately recognize and interpret emotion from human speech. The proposed mechanism underwent a rigorous evaluation process employing two distinct databases: RAVDESS and EMO-DB. The outcome displayed a predominant performance when juxtaposed with established models, registering an impressive accuracy of 94.8% on the RAVDESS dataset and a commendable 94.0% on the EMO-DB dataset. This superior performance underscores the efficacy of our innovative system in the realm of speech emotion recognition, as it outperforms current frameworks in accuracy metrics.
format Online
Article
Text
id pubmed-10383041
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-103830412023-07-30 Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders Pulatov, Ilkhomjon Oteniyazov, Rashid Makhmudov, Fazliddin Cho, Young-Im Sensors (Basel) Article Understanding and identifying emotional cues in human speech is a crucial aspect of human–computer communication. The application of computer technology in dissecting and deciphering emotions, along with the extraction of relevant emotional characteristics from speech, forms a significant part of this process. The objective of this study was to architect an innovative framework for speech emotion recognition predicated on spectrograms and semantic feature transcribers, aiming to bolster performance precision by acknowledging the conspicuous inadequacies in extant methodologies and rectifying them. To procure invaluable attributes for speech detection, this investigation leveraged two divergent strategies. Primarily, a wholly convolutional neural network model was engaged to transcribe speech spectrograms. Subsequently, a cutting-edge Mel-frequency cepstral coefficient feature abstraction approach was adopted and integrated with Speech2Vec for semantic feature encoding. These dual forms of attributes underwent individual processing before they were channeled into a long short-term memory network and a comprehensive connected layer for supplementary representation. By doing so, we aimed to bolster the sophistication and efficacy of our speech emotion detection model, thereby enhancing its potential to accurately recognize and interpret emotion from human speech. The proposed mechanism underwent a rigorous evaluation process employing two distinct databases: RAVDESS and EMO-DB. The outcome displayed a predominant performance when juxtaposed with established models, registering an impressive accuracy of 94.8% on the RAVDESS dataset and a commendable 94.0% on the EMO-DB dataset. This superior performance underscores the efficacy of our innovative system in the realm of speech emotion recognition, as it outperforms current frameworks in accuracy metrics. MDPI 2023-07-24 /pmc/articles/PMC10383041/ /pubmed/37514933 http://dx.doi.org/10.3390/s23146640 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Pulatov, Ilkhomjon
Oteniyazov, Rashid
Makhmudov, Fazliddin
Cho, Young-Im
Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders
title Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders
title_full Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders
title_fullStr Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders
title_full_unstemmed Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders
title_short Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders
title_sort enhancing speech emotion recognition using dual feature extraction encoders
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10383041/
https://www.ncbi.nlm.nih.gov/pubmed/37514933
http://dx.doi.org/10.3390/s23146640
work_keys_str_mv AT pulatovilkhomjon enhancingspeechemotionrecognitionusingdualfeatureextractionencoders
AT oteniyazovrashid enhancingspeechemotionrecognitionusingdualfeatureextractionencoders
AT makhmudovfazliddin enhancingspeechemotionrecognitionusingdualfeatureextractionencoders
AT choyoungim enhancingspeechemotionrecognitionusingdualfeatureextractionencoders