Cargando…
An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition
Multimodal emotion recognition implies the use of different resources and techniques for identifying and recognizing human emotions. A variety of data sources such as faces, speeches, voices, texts and others have to be processed simultaneously for this recognition task. However, most of the techniq...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10255527/ https://www.ncbi.nlm.nih.gov/pubmed/37299912 http://dx.doi.org/10.3390/s23115184 |
_version_ | 1785056894050631680 |
---|---|
author | Aguilera, Ana Mellado, Diego Rojas, Felipe |
author_facet | Aguilera, Ana Mellado, Diego Rojas, Felipe |
author_sort | Aguilera, Ana |
collection | PubMed |
description | Multimodal emotion recognition implies the use of different resources and techniques for identifying and recognizing human emotions. A variety of data sources such as faces, speeches, voices, texts and others have to be processed simultaneously for this recognition task. However, most of the techniques, which are based mainly on Deep Learning, are trained using datasets designed and built in controlled conditions, making their applicability in real contexts with real conditions more difficult. For this reason, the aim of this work is to assess a set of in-the-wild datasets to show their strengths and weaknesses for multimodal emotion recognition. Four in-the-wild datasets are evaluated: AFEW, SFEW, MELD and AffWild2. A multimodal architecture previously designed is used to perform the evaluation and classical metrics such as accuracy and F1-Score are used to measure performance in training and to validate quantitative results. However, strengths and weaknesses of these datasets for various uses indicate that by themselves they are not appropriate for multimodal recognition due to their original purpose, e.g., face or speech recognition. Therefore, we recommend a combination of multiple datasets in order to obtain better results when new samples are being processed and a good balance in the number of samples by class. |
format | Online Article Text |
id | pubmed-10255527 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-102555272023-06-10 An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition Aguilera, Ana Mellado, Diego Rojas, Felipe Sensors (Basel) Article Multimodal emotion recognition implies the use of different resources and techniques for identifying and recognizing human emotions. A variety of data sources such as faces, speeches, voices, texts and others have to be processed simultaneously for this recognition task. However, most of the techniques, which are based mainly on Deep Learning, are trained using datasets designed and built in controlled conditions, making their applicability in real contexts with real conditions more difficult. For this reason, the aim of this work is to assess a set of in-the-wild datasets to show their strengths and weaknesses for multimodal emotion recognition. Four in-the-wild datasets are evaluated: AFEW, SFEW, MELD and AffWild2. A multimodal architecture previously designed is used to perform the evaluation and classical metrics such as accuracy and F1-Score are used to measure performance in training and to validate quantitative results. However, strengths and weaknesses of these datasets for various uses indicate that by themselves they are not appropriate for multimodal recognition due to their original purpose, e.g., face or speech recognition. Therefore, we recommend a combination of multiple datasets in order to obtain better results when new samples are being processed and a good balance in the number of samples by class. MDPI 2023-05-30 /pmc/articles/PMC10255527/ /pubmed/37299912 http://dx.doi.org/10.3390/s23115184 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Aguilera, Ana Mellado, Diego Rojas, Felipe An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition |
title | An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition |
title_full | An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition |
title_fullStr | An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition |
title_full_unstemmed | An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition |
title_short | An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition |
title_sort | assessment of in-the-wild datasets for multimodal emotion recognition |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10255527/ https://www.ncbi.nlm.nih.gov/pubmed/37299912 http://dx.doi.org/10.3390/s23115184 |
work_keys_str_mv | AT aguileraana anassessmentofinthewilddatasetsformultimodalemotionrecognition AT melladodiego anassessmentofinthewilddatasetsformultimodalemotionrecognition AT rojasfelipe anassessmentofinthewilddatasetsformultimodalemotionrecognition AT aguileraana assessmentofinthewilddatasetsformultimodalemotionrecognition AT melladodiego assessmentofinthewilddatasetsformultimodalemotionrecognition AT rojasfelipe assessmentofinthewilddatasetsformultimodalemotionrecognition |