Cargando…

Hybrid Attention Cascade Network for Facial Expression Recognition

As a sub-challenge of EmotiW (the Emotion Recognition in the Wild challenge), how to improve performance on the AFEW (Acted Facial Expressions in the wild) dataset is a popular benchmark for emotion recognition tasks with various constraints, including uneven illumination, head deflection, and facia...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhu, Xiaoliang, Ye, Shihao, Zhao, Liang, Dai, Zhicheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8002145/
https://www.ncbi.nlm.nih.gov/pubmed/33809038
http://dx.doi.org/10.3390/s21062003
_version_ 1783671395232776192
author Zhu, Xiaoliang
Ye, Shihao
Zhao, Liang
Dai, Zhicheng
author_facet Zhu, Xiaoliang
Ye, Shihao
Zhao, Liang
Dai, Zhicheng
author_sort Zhu, Xiaoliang
collection PubMed
description As a sub-challenge of EmotiW (the Emotion Recognition in the Wild challenge), how to improve performance on the AFEW (Acted Facial Expressions in the wild) dataset is a popular benchmark for emotion recognition tasks with various constraints, including uneven illumination, head deflection, and facial posture. In this paper, we propose a convenient facial expression recognition cascade network comprising spatial feature extraction, hybrid attention, and temporal feature extraction. First, in a video sequence, faces in each frame are detected, and the corresponding face ROI (range of interest) is extracted to obtain the face images. Then, the face images in each frame are aligned based on the position information of the facial feature points in the images. Second, the aligned face images are input to the residual neural network to extract the spatial features of facial expressions corresponding to the face images. The spatial features are input to the hybrid attention module to obtain the fusion features of facial expressions. Finally, the fusion features are input in the gate control loop unit to extract the temporal features of facial expressions. The temporal features are input to the fully connected layer to classify and recognize facial expressions. Experiments using the CK+ (the extended Cohn Kanade), Oulu-CASIA (Institute of Automation, Chinese Academy of Sciences) and AFEW datasets obtained recognition accuracy rates of 98.46%, 87.31%, and 53.44%, respectively. This demonstrated that the proposed method achieves not only competitive performance comparable to state-of-the-art methods but also greater than 2% performance improvement on the AFEW dataset, proving the significant outperformance of facial expression recognition in the natural environment.
format Online
Article
Text
id pubmed-8002145
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-80021452021-03-28 Hybrid Attention Cascade Network for Facial Expression Recognition Zhu, Xiaoliang Ye, Shihao Zhao, Liang Dai, Zhicheng Sensors (Basel) Article As a sub-challenge of EmotiW (the Emotion Recognition in the Wild challenge), how to improve performance on the AFEW (Acted Facial Expressions in the wild) dataset is a popular benchmark for emotion recognition tasks with various constraints, including uneven illumination, head deflection, and facial posture. In this paper, we propose a convenient facial expression recognition cascade network comprising spatial feature extraction, hybrid attention, and temporal feature extraction. First, in a video sequence, faces in each frame are detected, and the corresponding face ROI (range of interest) is extracted to obtain the face images. Then, the face images in each frame are aligned based on the position information of the facial feature points in the images. Second, the aligned face images are input to the residual neural network to extract the spatial features of facial expressions corresponding to the face images. The spatial features are input to the hybrid attention module to obtain the fusion features of facial expressions. Finally, the fusion features are input in the gate control loop unit to extract the temporal features of facial expressions. The temporal features are input to the fully connected layer to classify and recognize facial expressions. Experiments using the CK+ (the extended Cohn Kanade), Oulu-CASIA (Institute of Automation, Chinese Academy of Sciences) and AFEW datasets obtained recognition accuracy rates of 98.46%, 87.31%, and 53.44%, respectively. This demonstrated that the proposed method achieves not only competitive performance comparable to state-of-the-art methods but also greater than 2% performance improvement on the AFEW dataset, proving the significant outperformance of facial expression recognition in the natural environment. MDPI 2021-03-12 /pmc/articles/PMC8002145/ /pubmed/33809038 http://dx.doi.org/10.3390/s21062003 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhu, Xiaoliang
Ye, Shihao
Zhao, Liang
Dai, Zhicheng
Hybrid Attention Cascade Network for Facial Expression Recognition
title Hybrid Attention Cascade Network for Facial Expression Recognition
title_full Hybrid Attention Cascade Network for Facial Expression Recognition
title_fullStr Hybrid Attention Cascade Network for Facial Expression Recognition
title_full_unstemmed Hybrid Attention Cascade Network for Facial Expression Recognition
title_short Hybrid Attention Cascade Network for Facial Expression Recognition
title_sort hybrid attention cascade network for facial expression recognition
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8002145/
https://www.ncbi.nlm.nih.gov/pubmed/33809038
http://dx.doi.org/10.3390/s21062003
work_keys_str_mv AT zhuxiaoliang hybridattentioncascadenetworkforfacialexpressionrecognition
AT yeshihao hybridattentioncascadenetworkforfacialexpressionrecognition
AT zhaoliang hybridattentioncascadenetworkforfacialexpressionrecognition
AT daizhicheng hybridattentioncascadenetworkforfacialexpressionrecognition