Cargando…
Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion
Decades of scientific research have been conducted on developing and evaluating methods for automated emotion recognition. With exponentially growing technology, there is a wide range of emerging applications that require emotional state recognition of the user. This paper investigates a robust appr...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8309929/ https://www.ncbi.nlm.nih.gov/pubmed/34300651 http://dx.doi.org/10.3390/s21144913 |
_version_ | 1783728638949064704 |
---|---|
author | Xie, Baijun Sidulova, Mariia Park, Chung Hyuk |
author_facet | Xie, Baijun Sidulova, Mariia Park, Chung Hyuk |
author_sort | Xie, Baijun |
collection | PubMed |
description | Decades of scientific research have been conducted on developing and evaluating methods for automated emotion recognition. With exponentially growing technology, there is a wide range of emerging applications that require emotional state recognition of the user. This paper investigates a robust approach for multimodal emotion recognition during a conversation. Three separate models for audio, video and text modalities are structured and fine-tuned on the MELD. In this paper, a transformer-based crossmodality fusion with the EmbraceNet architecture is employed to estimate the emotion. The proposed multimodal network architecture can achieve up to 65% accuracy, which significantly surpasses any of the unimodal models. We provide multiple evaluation techniques applied to our work to show that our model is robust and can even outperform the state-of-the-art models on the MELD. |
format | Online Article Text |
id | pubmed-8309929 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-83099292021-07-25 Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion Xie, Baijun Sidulova, Mariia Park, Chung Hyuk Sensors (Basel) Article Decades of scientific research have been conducted on developing and evaluating methods for automated emotion recognition. With exponentially growing technology, there is a wide range of emerging applications that require emotional state recognition of the user. This paper investigates a robust approach for multimodal emotion recognition during a conversation. Three separate models for audio, video and text modalities are structured and fine-tuned on the MELD. In this paper, a transformer-based crossmodality fusion with the EmbraceNet architecture is employed to estimate the emotion. The proposed multimodal network architecture can achieve up to 65% accuracy, which significantly surpasses any of the unimodal models. We provide multiple evaluation techniques applied to our work to show that our model is robust and can even outperform the state-of-the-art models on the MELD. MDPI 2021-07-19 /pmc/articles/PMC8309929/ /pubmed/34300651 http://dx.doi.org/10.3390/s21144913 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Xie, Baijun Sidulova, Mariia Park, Chung Hyuk Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion |
title | Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion |
title_full | Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion |
title_fullStr | Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion |
title_full_unstemmed | Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion |
title_short | Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion |
title_sort | robust multimodal emotion recognition from conversation with transformer-based crossmodality fusion |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8309929/ https://www.ncbi.nlm.nih.gov/pubmed/34300651 http://dx.doi.org/10.3390/s21144913 |
work_keys_str_mv | AT xiebaijun robustmultimodalemotionrecognitionfromconversationwithtransformerbasedcrossmodalityfusion AT sidulovamariia robustmultimodalemotionrecognitionfromconversationwithtransformerbasedcrossmodalityfusion AT parkchunghyuk robustmultimodalemotionrecognitionfromconversationwithtransformerbasedcrossmodalityfusion |