Cargando…

Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion

Decades of scientific research have been conducted on developing and evaluating methods for automated emotion recognition. With exponentially growing technology, there is a wide range of emerging applications that require emotional state recognition of the user. This paper investigates a robust appr...

Descripción completa

Detalles Bibliográficos
Autores principales: Xie, Baijun, Sidulova, Mariia, Park, Chung Hyuk
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8309929/
https://www.ncbi.nlm.nih.gov/pubmed/34300651
http://dx.doi.org/10.3390/s21144913
_version_ 1783728638949064704
author Xie, Baijun
Sidulova, Mariia
Park, Chung Hyuk
author_facet Xie, Baijun
Sidulova, Mariia
Park, Chung Hyuk
author_sort Xie, Baijun
collection PubMed
description Decades of scientific research have been conducted on developing and evaluating methods for automated emotion recognition. With exponentially growing technology, there is a wide range of emerging applications that require emotional state recognition of the user. This paper investigates a robust approach for multimodal emotion recognition during a conversation. Three separate models for audio, video and text modalities are structured and fine-tuned on the MELD. In this paper, a transformer-based crossmodality fusion with the EmbraceNet architecture is employed to estimate the emotion. The proposed multimodal network architecture can achieve up to 65% accuracy, which significantly surpasses any of the unimodal models. We provide multiple evaluation techniques applied to our work to show that our model is robust and can even outperform the state-of-the-art models on the MELD.
format Online
Article
Text
id pubmed-8309929
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-83099292021-07-25 Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion Xie, Baijun Sidulova, Mariia Park, Chung Hyuk Sensors (Basel) Article Decades of scientific research have been conducted on developing and evaluating methods for automated emotion recognition. With exponentially growing technology, there is a wide range of emerging applications that require emotional state recognition of the user. This paper investigates a robust approach for multimodal emotion recognition during a conversation. Three separate models for audio, video and text modalities are structured and fine-tuned on the MELD. In this paper, a transformer-based crossmodality fusion with the EmbraceNet architecture is employed to estimate the emotion. The proposed multimodal network architecture can achieve up to 65% accuracy, which significantly surpasses any of the unimodal models. We provide multiple evaluation techniques applied to our work to show that our model is robust and can even outperform the state-of-the-art models on the MELD. MDPI 2021-07-19 /pmc/articles/PMC8309929/ /pubmed/34300651 http://dx.doi.org/10.3390/s21144913 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Xie, Baijun
Sidulova, Mariia
Park, Chung Hyuk
Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion
title Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion
title_full Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion
title_fullStr Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion
title_full_unstemmed Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion
title_short Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion
title_sort robust multimodal emotion recognition from conversation with transformer-based crossmodality fusion
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8309929/
https://www.ncbi.nlm.nih.gov/pubmed/34300651
http://dx.doi.org/10.3390/s21144913
work_keys_str_mv AT xiebaijun robustmultimodalemotionrecognitionfromconversationwithtransformerbasedcrossmodalityfusion
AT sidulovamariia robustmultimodalemotionrecognitionfromconversationwithtransformerbasedcrossmodalityfusion
AT parkchunghyuk robustmultimodalemotionrecognitionfromconversationwithtransformerbasedcrossmodalityfusion