Cargando…
GCF(2)-Net: global-aware cross-modal feature fusion network for speech emotion recognition
Emotion recognition plays an essential role in interpersonal communication. However, existing recognition systems use only features of a single modality for emotion recognition, ignoring the interaction of information from the different modalities. Therefore, in our study, we propose a global-aware...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10192703/ https://www.ncbi.nlm.nih.gov/pubmed/37214410 http://dx.doi.org/10.3389/fnins.2023.1183132 |
_version_ | 1785043681894465536 |
---|---|
author | Li, Feng Luo, Jiusong Wang, Lingling Liu, Wei Sang, Xiaoshuang |
author_facet | Li, Feng Luo, Jiusong Wang, Lingling Liu, Wei Sang, Xiaoshuang |
author_sort | Li, Feng |
collection | PubMed |
description | Emotion recognition plays an essential role in interpersonal communication. However, existing recognition systems use only features of a single modality for emotion recognition, ignoring the interaction of information from the different modalities. Therefore, in our study, we propose a global-aware Cross-modal feature Fusion Network (GCF(2)-Net) for recognizing emotion. We construct a residual cross-modal fusion attention module (ResCMFA) to fuse information from multiple modalities and design a global-aware module to capture global details. More specifically, we first use transfer learning to extract wav2vec 2.0 features and text features fused by the ResCMFA module. Then, cross-modal fusion features are fed into the global-aware module to capture the most essential emotional information globally. Finally, the experiment results have shown that our proposed method has significant advantages than state-of-the-art methods on the IEMOCAP and MELD datasets, respectively. |
format | Online Article Text |
id | pubmed-10192703 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-101927032023-05-19 GCF(2)-Net: global-aware cross-modal feature fusion network for speech emotion recognition Li, Feng Luo, Jiusong Wang, Lingling Liu, Wei Sang, Xiaoshuang Front Neurosci Neuroscience Emotion recognition plays an essential role in interpersonal communication. However, existing recognition systems use only features of a single modality for emotion recognition, ignoring the interaction of information from the different modalities. Therefore, in our study, we propose a global-aware Cross-modal feature Fusion Network (GCF(2)-Net) for recognizing emotion. We construct a residual cross-modal fusion attention module (ResCMFA) to fuse information from multiple modalities and design a global-aware module to capture global details. More specifically, we first use transfer learning to extract wav2vec 2.0 features and text features fused by the ResCMFA module. Then, cross-modal fusion features are fed into the global-aware module to capture the most essential emotional information globally. Finally, the experiment results have shown that our proposed method has significant advantages than state-of-the-art methods on the IEMOCAP and MELD datasets, respectively. Frontiers Media S.A. 2023-05-04 /pmc/articles/PMC10192703/ /pubmed/37214410 http://dx.doi.org/10.3389/fnins.2023.1183132 Text en Copyright © 2023 Li, Luo, Wang, Liu and Sang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Neuroscience Li, Feng Luo, Jiusong Wang, Lingling Liu, Wei Sang, Xiaoshuang GCF(2)-Net: global-aware cross-modal feature fusion network for speech emotion recognition |
title | GCF(2)-Net: global-aware cross-modal feature fusion network for speech emotion recognition |
title_full | GCF(2)-Net: global-aware cross-modal feature fusion network for speech emotion recognition |
title_fullStr | GCF(2)-Net: global-aware cross-modal feature fusion network for speech emotion recognition |
title_full_unstemmed | GCF(2)-Net: global-aware cross-modal feature fusion network for speech emotion recognition |
title_short | GCF(2)-Net: global-aware cross-modal feature fusion network for speech emotion recognition |
title_sort | gcf(2)-net: global-aware cross-modal feature fusion network for speech emotion recognition |
topic | Neuroscience |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10192703/ https://www.ncbi.nlm.nih.gov/pubmed/37214410 http://dx.doi.org/10.3389/fnins.2023.1183132 |
work_keys_str_mv | AT lifeng gcf2netglobalawarecrossmodalfeaturefusionnetworkforspeechemotionrecognition AT luojiusong gcf2netglobalawarecrossmodalfeaturefusionnetworkforspeechemotionrecognition AT wanglingling gcf2netglobalawarecrossmodalfeaturefusionnetworkforspeechemotionrecognition AT liuwei gcf2netglobalawarecrossmodalfeaturefusionnetworkforspeechemotionrecognition AT sangxiaoshuang gcf2netglobalawarecrossmodalfeaturefusionnetworkforspeechemotionrecognition |