Cargando…

Emotional Speech Recognition Using Deep Neural Networks

The expression of emotions in human communication plays a very important role in the information that needs to be conveyed to the partner. The forms of expression of human emotions are very rich. It could be body language, facial expressions, eye contact, laughter, and tone of voice. The languages o...

Descripción completa

Detalles Bibliográficos
Autores principales: Trinh Van, Loan, Dao Thi Le, Thuy, Le Xuan, Thanh, Castelli, Eric
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8877219/
https://www.ncbi.nlm.nih.gov/pubmed/35214316
http://dx.doi.org/10.3390/s22041414
_version_ 1784658368219054080
author Trinh Van, Loan
Dao Thi Le, Thuy
Le Xuan, Thanh
Castelli, Eric
author_facet Trinh Van, Loan
Dao Thi Le, Thuy
Le Xuan, Thanh
Castelli, Eric
author_sort Trinh Van, Loan
collection PubMed
description The expression of emotions in human communication plays a very important role in the information that needs to be conveyed to the partner. The forms of expression of human emotions are very rich. It could be body language, facial expressions, eye contact, laughter, and tone of voice. The languages of the world’s peoples are different, but even without understanding a language in communication, people can almost understand part of the message that the other partner wants to convey with emotional expressions as mentioned. Among the forms of human emotional expression, the expression of emotions through voice is perhaps the most studied. This article presents our research on speech emotion recognition using deep neural networks such as CNN, CRNN, and GRU. We used the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus for the study with four emotions: anger, happiness, sadness, and neutrality. The feature parameters used for recognition include the Mel spectral coefficients and other parameters related to the spectrum and the intensity of the speech signal. The data augmentation was used by changing the voice and adding white noise. The results show that the GRU model gave the highest average recognition accuracy of 97.47%. This result is superior to existing studies on speech emotion recognition with the IEMOCAP corpus.
format Online
Article
Text
id pubmed-8877219
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-88772192022-02-26 Emotional Speech Recognition Using Deep Neural Networks Trinh Van, Loan Dao Thi Le, Thuy Le Xuan, Thanh Castelli, Eric Sensors (Basel) Article The expression of emotions in human communication plays a very important role in the information that needs to be conveyed to the partner. The forms of expression of human emotions are very rich. It could be body language, facial expressions, eye contact, laughter, and tone of voice. The languages of the world’s peoples are different, but even without understanding a language in communication, people can almost understand part of the message that the other partner wants to convey with emotional expressions as mentioned. Among the forms of human emotional expression, the expression of emotions through voice is perhaps the most studied. This article presents our research on speech emotion recognition using deep neural networks such as CNN, CRNN, and GRU. We used the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus for the study with four emotions: anger, happiness, sadness, and neutrality. The feature parameters used for recognition include the Mel spectral coefficients and other parameters related to the spectrum and the intensity of the speech signal. The data augmentation was used by changing the voice and adding white noise. The results show that the GRU model gave the highest average recognition accuracy of 97.47%. This result is superior to existing studies on speech emotion recognition with the IEMOCAP corpus. MDPI 2022-02-12 /pmc/articles/PMC8877219/ /pubmed/35214316 http://dx.doi.org/10.3390/s22041414 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Trinh Van, Loan
Dao Thi Le, Thuy
Le Xuan, Thanh
Castelli, Eric
Emotional Speech Recognition Using Deep Neural Networks
title Emotional Speech Recognition Using Deep Neural Networks
title_full Emotional Speech Recognition Using Deep Neural Networks
title_fullStr Emotional Speech Recognition Using Deep Neural Networks
title_full_unstemmed Emotional Speech Recognition Using Deep Neural Networks
title_short Emotional Speech Recognition Using Deep Neural Networks
title_sort emotional speech recognition using deep neural networks
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8877219/
https://www.ncbi.nlm.nih.gov/pubmed/35214316
http://dx.doi.org/10.3390/s22041414
work_keys_str_mv AT trinhvanloan emotionalspeechrecognitionusingdeepneuralnetworks
AT daothilethuy emotionalspeechrecognitionusingdeepneuralnetworks
AT lexuanthanh emotionalspeechrecognitionusingdeepneuralnetworks
AT castellieric emotionalspeechrecognitionusingdeepneuralnetworks