Cargando…

Speech Emotion Recognition Using Attention Model

Speech emotion recognition is an important research topic that can help to maintain and improve public health and contribute towards the ongoing progress of healthcare technology. There have been several advancements in the field of speech emotion recognition systems including the use of deep learni...

Descripción completa

Detalles Bibliográficos
Autores principales: Singh, Jagjeet, Saheer, Lakshmi Babu, Faust, Oliver
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10049636/
https://www.ncbi.nlm.nih.gov/pubmed/36982048
http://dx.doi.org/10.3390/ijerph20065140
_version_ 1785014500886315008
author Singh, Jagjeet
Saheer, Lakshmi Babu
Faust, Oliver
author_facet Singh, Jagjeet
Saheer, Lakshmi Babu
Faust, Oliver
author_sort Singh, Jagjeet
collection PubMed
description Speech emotion recognition is an important research topic that can help to maintain and improve public health and contribute towards the ongoing progress of healthcare technology. There have been several advancements in the field of speech emotion recognition systems including the use of deep learning models and new acoustic and temporal features. This paper proposes a self-attention-based deep learning model that was created by combining a two-dimensional Convolutional Neural Network (CNN) and a long short-term memory (LSTM) network. This research builds on the existing literature to identify the best-performing features for this task with extensive experiments on different combinations of spectral and rhythmic information. Mel Frequency Cepstral Coefficients (MFCCs) emerged as the best performing features for this task. The experiments were performed on a customised dataset that was developed as a combination of RAVDESS, SAVEE, and TESS datasets. Eight states of emotions (happy, sad, angry, surprise, disgust, calm, fearful, and neutral) were detected. The proposed attention-based deep learning model achieved an average test accuracy rate of 90%, which is a substantial improvement over established models. Hence, this emotion detection model has the potential to improve automated mental health monitoring.
format Online
Article
Text
id pubmed-10049636
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-100496362023-03-29 Speech Emotion Recognition Using Attention Model Singh, Jagjeet Saheer, Lakshmi Babu Faust, Oliver Int J Environ Res Public Health Article Speech emotion recognition is an important research topic that can help to maintain and improve public health and contribute towards the ongoing progress of healthcare technology. There have been several advancements in the field of speech emotion recognition systems including the use of deep learning models and new acoustic and temporal features. This paper proposes a self-attention-based deep learning model that was created by combining a two-dimensional Convolutional Neural Network (CNN) and a long short-term memory (LSTM) network. This research builds on the existing literature to identify the best-performing features for this task with extensive experiments on different combinations of spectral and rhythmic information. Mel Frequency Cepstral Coefficients (MFCCs) emerged as the best performing features for this task. The experiments were performed on a customised dataset that was developed as a combination of RAVDESS, SAVEE, and TESS datasets. Eight states of emotions (happy, sad, angry, surprise, disgust, calm, fearful, and neutral) were detected. The proposed attention-based deep learning model achieved an average test accuracy rate of 90%, which is a substantial improvement over established models. Hence, this emotion detection model has the potential to improve automated mental health monitoring. MDPI 2023-03-14 /pmc/articles/PMC10049636/ /pubmed/36982048 http://dx.doi.org/10.3390/ijerph20065140 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Singh, Jagjeet
Saheer, Lakshmi Babu
Faust, Oliver
Speech Emotion Recognition Using Attention Model
title Speech Emotion Recognition Using Attention Model
title_full Speech Emotion Recognition Using Attention Model
title_fullStr Speech Emotion Recognition Using Attention Model
title_full_unstemmed Speech Emotion Recognition Using Attention Model
title_short Speech Emotion Recognition Using Attention Model
title_sort speech emotion recognition using attention model
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10049636/
https://www.ncbi.nlm.nih.gov/pubmed/36982048
http://dx.doi.org/10.3390/ijerph20065140
work_keys_str_mv AT singhjagjeet speechemotionrecognitionusingattentionmodel
AT saheerlakshmibabu speechemotionrecognitionusingattentionmodel
AT faustoliver speechemotionrecognitionusingattentionmodel