Cargando…

Context-Aware Emotion Recognition in the Wild Using Spatio-Temporal and Temporal-Pyramid Models

Emotion recognition plays an important role in human–computer interactions. Recent studies have focused on video emotion recognition in the wild and have run into difficulties related to occlusion, illumination, complex behavior over time, and auditory cues. State-of-the-art methods use multiple mod...

Descripción completa

Detalles Bibliográficos
Autores principales: Do, Nhu-Tai, Kim, Soo-Hyung, Yang, Hyung-Jeong, Lee, Guee-Sang, Yeom, Soonja
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8036494/
https://www.ncbi.nlm.nih.gov/pubmed/33801739
http://dx.doi.org/10.3390/s21072344
_version_ 1783676923277213696
author Do, Nhu-Tai
Kim, Soo-Hyung
Yang, Hyung-Jeong
Lee, Guee-Sang
Yeom, Soonja
author_facet Do, Nhu-Tai
Kim, Soo-Hyung
Yang, Hyung-Jeong
Lee, Guee-Sang
Yeom, Soonja
author_sort Do, Nhu-Tai
collection PubMed
description Emotion recognition plays an important role in human–computer interactions. Recent studies have focused on video emotion recognition in the wild and have run into difficulties related to occlusion, illumination, complex behavior over time, and auditory cues. State-of-the-art methods use multiple modalities, such as frame-level, spatiotemporal, and audio approaches. However, such methods have difficulties in exploiting long-term dependencies in temporal information, capturing contextual information, and integrating multi-modal information. In this paper, we introduce a multi-modal flexible system for video-based emotion recognition in the wild. Our system tracks and votes on significant faces corresponding to persons of interest in a video to classify seven basic emotions. The key contribution of this study is that it proposes the use of face feature extraction with context-aware and statistical information for emotion recognition. We also build two model architectures to effectively exploit long-term dependencies in temporal information with a temporal-pyramid model and a spatiotemporal model with “Conv2D+LSTM+3DCNN+Classify” architecture. Finally, we propose the best selection ensemble to improve the accuracy of multi-modal fusion. The best selection ensemble selects the best combination from spatiotemporal and temporal-pyramid models to achieve the best accuracy for classifying the seven basic emotions. In our experiment, we take benchmark measurement on the AFEW dataset with high accuracy.
format Online
Article
Text
id pubmed-8036494
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-80364942021-04-12 Context-Aware Emotion Recognition in the Wild Using Spatio-Temporal and Temporal-Pyramid Models Do, Nhu-Tai Kim, Soo-Hyung Yang, Hyung-Jeong Lee, Guee-Sang Yeom, Soonja Sensors (Basel) Article Emotion recognition plays an important role in human–computer interactions. Recent studies have focused on video emotion recognition in the wild and have run into difficulties related to occlusion, illumination, complex behavior over time, and auditory cues. State-of-the-art methods use multiple modalities, such as frame-level, spatiotemporal, and audio approaches. However, such methods have difficulties in exploiting long-term dependencies in temporal information, capturing contextual information, and integrating multi-modal information. In this paper, we introduce a multi-modal flexible system for video-based emotion recognition in the wild. Our system tracks and votes on significant faces corresponding to persons of interest in a video to classify seven basic emotions. The key contribution of this study is that it proposes the use of face feature extraction with context-aware and statistical information for emotion recognition. We also build two model architectures to effectively exploit long-term dependencies in temporal information with a temporal-pyramid model and a spatiotemporal model with “Conv2D+LSTM+3DCNN+Classify” architecture. Finally, we propose the best selection ensemble to improve the accuracy of multi-modal fusion. The best selection ensemble selects the best combination from spatiotemporal and temporal-pyramid models to achieve the best accuracy for classifying the seven basic emotions. In our experiment, we take benchmark measurement on the AFEW dataset with high accuracy. MDPI 2021-03-27 /pmc/articles/PMC8036494/ /pubmed/33801739 http://dx.doi.org/10.3390/s21072344 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ).
spellingShingle Article
Do, Nhu-Tai
Kim, Soo-Hyung
Yang, Hyung-Jeong
Lee, Guee-Sang
Yeom, Soonja
Context-Aware Emotion Recognition in the Wild Using Spatio-Temporal and Temporal-Pyramid Models
title Context-Aware Emotion Recognition in the Wild Using Spatio-Temporal and Temporal-Pyramid Models
title_full Context-Aware Emotion Recognition in the Wild Using Spatio-Temporal and Temporal-Pyramid Models
title_fullStr Context-Aware Emotion Recognition in the Wild Using Spatio-Temporal and Temporal-Pyramid Models
title_full_unstemmed Context-Aware Emotion Recognition in the Wild Using Spatio-Temporal and Temporal-Pyramid Models
title_short Context-Aware Emotion Recognition in the Wild Using Spatio-Temporal and Temporal-Pyramid Models
title_sort context-aware emotion recognition in the wild using spatio-temporal and temporal-pyramid models
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8036494/
https://www.ncbi.nlm.nih.gov/pubmed/33801739
http://dx.doi.org/10.3390/s21072344
work_keys_str_mv AT donhutai contextawareemotionrecognitioninthewildusingspatiotemporalandtemporalpyramidmodels
AT kimsoohyung contextawareemotionrecognitioninthewildusingspatiotemporalandtemporalpyramidmodels
AT yanghyungjeong contextawareemotionrecognitioninthewildusingspatiotemporalandtemporalpyramidmodels
AT leegueesang contextawareemotionrecognitioninthewildusingspatiotemporalandtemporalpyramidmodels
AT yeomsoonja contextawareemotionrecognitioninthewildusingspatiotemporalandtemporalpyramidmodels