Cargando…

Vector learning representation for generalized speech emotion recognition

Speech emotion recognition (SER) plays an important role in global business today to improve service efficiency. In the literature of SER, many techniques have been using deep learning to extract and learn features. Recently, we have proposed end-to-end learning for a deep residual local feature lea...

Descripción completa

Detalles Bibliográficos
Autores principales:	Singkul, Sattaya, Woraratpanya, Kuntpong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9280549/ https://www.ncbi.nlm.nih.gov/pubmed/35846479 http://dx.doi.org/10.1016/j.heliyon.2022.e09196

_version_	1784746671967567872
author	Singkul, Sattaya Woraratpanya, Kuntpong
author_facet	Singkul, Sattaya Woraratpanya, Kuntpong
author_sort	Singkul, Sattaya
collection	PubMed
description	Speech emotion recognition (SER) plays an important role in global business today to improve service efficiency. In the literature of SER, many techniques have been using deep learning to extract and learn features. Recently, we have proposed end-to-end learning for a deep residual local feature learning block (DeepResLFLB). The advantages of end-to-end learning are low engineering effort and less hyperparameter tuning. Nevertheless, this learning method is easily to fall into an overfitting problem. Therefore, this paper described the concept of the “verify-to-classify” framework to apply for learning vectors, extracted from feature spaces of emotional information. This framework consists of two important portions: speech emotion learning and recognition. In speech emotion learning, consisting of two steps: speech emotion verification enrolled training and prediction, the residual learning (ResNet) with squeeze-excitation (SE) block was used as a core component of both steps to extract emotional state vectors and build an emotion model by the speech emotion verification enrolled training. Then the in-domain pre-trained weights of the emotion trained model are transferred to the prediction step. As a result of the speech emotion learning, the accepted model—validated by EER—is transferred to the speech emotion recognition in terms of out-domain pre-trained weights, which are ready for classification using a classical ML method. In this manner, a suitable loss function is important to work with emotional vectors. Here, two loss functions were proposed: angular prototypical and softmax with angular prototypical losses. Based on two publicly available datasets: Emo-DB and RAVDESS, both with high- and low-quality environments. The experimental results show that our proposed method can significantly improve generalized performance and explainable emotion results, when evaluated by standard metrics: EER, accuracy, precision, recall, and F1-score.
format	Online Article Text
id	pubmed-9280549
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-92805492022-07-15 Vector learning representation for generalized speech emotion recognition Singkul, Sattaya Woraratpanya, Kuntpong Heliyon Research Article Speech emotion recognition (SER) plays an important role in global business today to improve service efficiency. In the literature of SER, many techniques have been using deep learning to extract and learn features. Recently, we have proposed end-to-end learning for a deep residual local feature learning block (DeepResLFLB). The advantages of end-to-end learning are low engineering effort and less hyperparameter tuning. Nevertheless, this learning method is easily to fall into an overfitting problem. Therefore, this paper described the concept of the “verify-to-classify” framework to apply for learning vectors, extracted from feature spaces of emotional information. This framework consists of two important portions: speech emotion learning and recognition. In speech emotion learning, consisting of two steps: speech emotion verification enrolled training and prediction, the residual learning (ResNet) with squeeze-excitation (SE) block was used as a core component of both steps to extract emotional state vectors and build an emotion model by the speech emotion verification enrolled training. Then the in-domain pre-trained weights of the emotion trained model are transferred to the prediction step. As a result of the speech emotion learning, the accepted model—validated by EER—is transferred to the speech emotion recognition in terms of out-domain pre-trained weights, which are ready for classification using a classical ML method. In this manner, a suitable loss function is important to work with emotional vectors. Here, two loss functions were proposed: angular prototypical and softmax with angular prototypical losses. Based on two publicly available datasets: Emo-DB and RAVDESS, both with high- and low-quality environments. The experimental results show that our proposed method can significantly improve generalized performance and explainable emotion results, when evaluated by standard metrics: EER, accuracy, precision, recall, and F1-score. Elsevier 2022-03-28 /pmc/articles/PMC9280549/ /pubmed/35846479 http://dx.doi.org/10.1016/j.heliyon.2022.e09196 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Research Article Singkul, Sattaya Woraratpanya, Kuntpong Vector learning representation for generalized speech emotion recognition
title	Vector learning representation for generalized speech emotion recognition
title_full	Vector learning representation for generalized speech emotion recognition
title_fullStr	Vector learning representation for generalized speech emotion recognition
title_full_unstemmed	Vector learning representation for generalized speech emotion recognition
title_short	Vector learning representation for generalized speech emotion recognition
title_sort	vector learning representation for generalized speech emotion recognition
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9280549/ https://www.ncbi.nlm.nih.gov/pubmed/35846479 http://dx.doi.org/10.1016/j.heliyon.2022.e09196
work_keys_str_mv	AT singkulsattaya vectorlearningrepresentationforgeneralizedspeechemotionrecognition AT woraratpanyakuntpong vectorlearningrepresentationforgeneralizedspeechemotionrecognition

Vector learning representation for generalized speech emotion recognition

Ejemplares similares