Cargando…

Multimodal interaction enhanced representation learning for video emotion recognition

Video emotion recognition aims to infer human emotional states from the audio, visual, and text modalities. Previous approaches are centered around designing sophisticated fusion mechanisms, but usually ignore the fact that text contains global semantic information, while speech and face video show...

Descripción completa

Detalles Bibliográficos
Autores principales: Xia, Xiaohan, Zhao, Yong, Jiang, Dongmei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9806211/
https://www.ncbi.nlm.nih.gov/pubmed/36601594
http://dx.doi.org/10.3389/fnins.2022.1086380