Cargando…

An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition

Multimodal emotion recognition implies the use of different resources and techniques for identifying and recognizing human emotions. A variety of data sources such as faces, speeches, voices, texts and others have to be processed simultaneously for this recognition task. However, most of the techniq...

Descripción completa

Detalles Bibliográficos
Autores principales: Aguilera, Ana, Mellado, Diego, Rojas, Felipe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10255527/
https://www.ncbi.nlm.nih.gov/pubmed/37299912
http://dx.doi.org/10.3390/s23115184
_version_ 1785056894050631680
author Aguilera, Ana
Mellado, Diego
Rojas, Felipe
author_facet Aguilera, Ana
Mellado, Diego
Rojas, Felipe
author_sort Aguilera, Ana
collection PubMed
description Multimodal emotion recognition implies the use of different resources and techniques for identifying and recognizing human emotions. A variety of data sources such as faces, speeches, voices, texts and others have to be processed simultaneously for this recognition task. However, most of the techniques, which are based mainly on Deep Learning, are trained using datasets designed and built in controlled conditions, making their applicability in real contexts with real conditions more difficult. For this reason, the aim of this work is to assess a set of in-the-wild datasets to show their strengths and weaknesses for multimodal emotion recognition. Four in-the-wild datasets are evaluated: AFEW, SFEW, MELD and AffWild2. A multimodal architecture previously designed is used to perform the evaluation and classical metrics such as accuracy and F1-Score are used to measure performance in training and to validate quantitative results. However, strengths and weaknesses of these datasets for various uses indicate that by themselves they are not appropriate for multimodal recognition due to their original purpose, e.g., face or speech recognition. Therefore, we recommend a combination of multiple datasets in order to obtain better results when new samples are being processed and a good balance in the number of samples by class.
format Online
Article
Text
id pubmed-10255527
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-102555272023-06-10 An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition Aguilera, Ana Mellado, Diego Rojas, Felipe Sensors (Basel) Article Multimodal emotion recognition implies the use of different resources and techniques for identifying and recognizing human emotions. A variety of data sources such as faces, speeches, voices, texts and others have to be processed simultaneously for this recognition task. However, most of the techniques, which are based mainly on Deep Learning, are trained using datasets designed and built in controlled conditions, making their applicability in real contexts with real conditions more difficult. For this reason, the aim of this work is to assess a set of in-the-wild datasets to show their strengths and weaknesses for multimodal emotion recognition. Four in-the-wild datasets are evaluated: AFEW, SFEW, MELD and AffWild2. A multimodal architecture previously designed is used to perform the evaluation and classical metrics such as accuracy and F1-Score are used to measure performance in training and to validate quantitative results. However, strengths and weaknesses of these datasets for various uses indicate that by themselves they are not appropriate for multimodal recognition due to their original purpose, e.g., face or speech recognition. Therefore, we recommend a combination of multiple datasets in order to obtain better results when new samples are being processed and a good balance in the number of samples by class. MDPI 2023-05-30 /pmc/articles/PMC10255527/ /pubmed/37299912 http://dx.doi.org/10.3390/s23115184 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Aguilera, Ana
Mellado, Diego
Rojas, Felipe
An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition
title An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition
title_full An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition
title_fullStr An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition
title_full_unstemmed An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition
title_short An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition
title_sort assessment of in-the-wild datasets for multimodal emotion recognition
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10255527/
https://www.ncbi.nlm.nih.gov/pubmed/37299912
http://dx.doi.org/10.3390/s23115184
work_keys_str_mv AT aguileraana anassessmentofinthewilddatasetsformultimodalemotionrecognition
AT melladodiego anassessmentofinthewilddatasetsformultimodalemotionrecognition
AT rojasfelipe anassessmentofinthewilddatasetsformultimodalemotionrecognition
AT aguileraana assessmentofinthewilddatasetsformultimodalemotionrecognition
AT melladodiego assessmentofinthewilddatasetsformultimodalemotionrecognition
AT rojasfelipe assessmentofinthewilddatasetsformultimodalemotionrecognition