Cargando…

Multi-Modality Emotion Recognition Model with GAT-Based Multi-Head Inter-Modality Attention

Emotion recognition has been gaining attention in recent years due to its applications on artificial agents. To achieve a good performance with this task, much research has been conducted on the multi-modality emotion recognition model for leveraging the different strengths of each modality. However...

Descripción completa

Detalles Bibliográficos
Autores principales: Fu, Changzeng, Liu, Chaoran, Ishi, Carlos Toshinori, Ishiguro, Hiroshi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7506856/
https://www.ncbi.nlm.nih.gov/pubmed/32872511
http://dx.doi.org/10.3390/s20174894
Descripción
Sumario:Emotion recognition has been gaining attention in recent years due to its applications on artificial agents. To achieve a good performance with this task, much research has been conducted on the multi-modality emotion recognition model for leveraging the different strengths of each modality. However, a research question remains: what exactly is the most appropriate way to fuse the information from different modalities? In this paper, we proposed audio sample augmentation and an emotion-oriented encoder-decoder to improve the performance of emotion recognition and discussed an inter-modality, decision-level fusion method based on a graph attention network (GAT). Compared to the baseline, our model improved the weighted average F1-scores from 64.18 to 68.31% and the weighted average accuracy from 65.25 to 69.88%.