Cargando…

Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning

Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech...

Descripción completa

Detalles Bibliográficos
Autores principales: Luna-Jiménez, Cristina, Griol, David, Callejas, Zoraida, Kleinlein, Ricardo, Montero, Juan M., Fernández-Martínez, Fernando
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8618559/
https://www.ncbi.nlm.nih.gov/pubmed/34833739
http://dx.doi.org/10.3390/s21227665
_version_ 1784604776944631808
author Luna-Jiménez, Cristina
Griol, David
Callejas, Zoraida
Kleinlein, Ricardo
Montero, Juan M.
Fernández-Martínez, Fernando
author_facet Luna-Jiménez, Cristina
Griol, David
Callejas, Zoraida
Kleinlein, Ricardo
Montero, Juan M.
Fernández-Martínez, Fernando
author_sort Luna-Jiménez, Cristina
collection PubMed
description Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a video-based task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users’ emotional state and their combination enables improvement of system performance.
format Online
Article
Text
id pubmed-8618559
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-86185592021-11-27 Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning Luna-Jiménez, Cristina Griol, David Callejas, Zoraida Kleinlein, Ricardo Montero, Juan M. Fernández-Martínez, Fernando Sensors (Basel) Article Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a video-based task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users’ emotional state and their combination enables improvement of system performance. MDPI 2021-11-18 /pmc/articles/PMC8618559/ /pubmed/34833739 http://dx.doi.org/10.3390/s21227665 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Luna-Jiménez, Cristina
Griol, David
Callejas, Zoraida
Kleinlein, Ricardo
Montero, Juan M.
Fernández-Martínez, Fernando
Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning
title Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning
title_full Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning
title_fullStr Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning
title_full_unstemmed Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning
title_short Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning
title_sort multimodal emotion recognition on ravdess dataset using transfer learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8618559/
https://www.ncbi.nlm.nih.gov/pubmed/34833739
http://dx.doi.org/10.3390/s21227665
work_keys_str_mv AT lunajimenezcristina multimodalemotionrecognitiononravdessdatasetusingtransferlearning
AT grioldavid multimodalemotionrecognitiononravdessdatasetusingtransferlearning
AT callejaszoraida multimodalemotionrecognitiononravdessdatasetusingtransferlearning
AT kleinleinricardo multimodalemotionrecognitiononravdessdatasetusingtransferlearning
AT monterojuanm multimodalemotionrecognitiononravdessdatasetusingtransferlearning
AT fernandezmartinezfernando multimodalemotionrecognitiononravdessdatasetusingtransferlearning