Cargando…

Facial Expression Recognition: One Attention-Modulated Contextual Spatial Information Network

Facial expression recognition (FER) in the wild is a challenging task due to some uncontrolled factors such as occlusion, illumination, and pose variation. The current methods perform well in controlled conditions. However, there are still two issues with the in-the-wild FER task: (i) insufficient d...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Xue, Zhu, Chunhua, Zhou, Fei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9324190/ https://www.ncbi.nlm.nih.gov/pubmed/35885106 http://dx.doi.org/10.3390/e24070882

_version_	1784756746834673664
author	Li, Xue Zhu, Chunhua Zhou, Fei
author_facet	Li, Xue Zhu, Chunhua Zhou, Fei
author_sort	Li, Xue
collection	PubMed
description	Facial expression recognition (FER) in the wild is a challenging task due to some uncontrolled factors such as occlusion, illumination, and pose variation. The current methods perform well in controlled conditions. However, there are still two issues with the in-the-wild FER task: (i) insufficient descriptions of long-range dependency of expression features in the facial information space and (ii) not finely refining subtle inter-classes distinction from multiple expressions in the wild. To overcome the above issues, an end-to-end model for FER, named attention-modulated contextual spatial information network (ACSI-Net), is presented in this paper, with the manner of embedding coordinate attention (CA) modules into a contextual convolutional residual network (CoResNet). Firstly, CoResNet is constituted by arranging contextual convolution (CoConv) blocks of different levels to integrate facial expression features with long-range dependency, which generates a holistic representation of spatial information on facial expression. Then, the CA modules are inserted into different stages of CoResNet, at each of which the subtle information about facial expression acquired from CoConv blocks is first modulated by the corresponding CA module across channels and spatial locations and then flows into the next layer. Finally, to highlight facial regions related to expression, a CA module located at the end of the whole network, which produces attentional masks to multiply by input feature maps, is utilized to focus on salient regions. Different from other models, the ACSI-Net is capable of exploring intrinsic dependencies between features and yielding a discriminative representation for facial expression classification. Extensive experimental results on AffectNet and RAF_DB datasets demonstrate its effectiveness and competitiveness compared to other FER methods.
format	Online Article Text
id	pubmed-9324190
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-93241902022-07-27 Facial Expression Recognition: One Attention-Modulated Contextual Spatial Information Network Li, Xue Zhu, Chunhua Zhou, Fei Entropy (Basel) Article Facial expression recognition (FER) in the wild is a challenging task due to some uncontrolled factors such as occlusion, illumination, and pose variation. The current methods perform well in controlled conditions. However, there are still two issues with the in-the-wild FER task: (i) insufficient descriptions of long-range dependency of expression features in the facial information space and (ii) not finely refining subtle inter-classes distinction from multiple expressions in the wild. To overcome the above issues, an end-to-end model for FER, named attention-modulated contextual spatial information network (ACSI-Net), is presented in this paper, with the manner of embedding coordinate attention (CA) modules into a contextual convolutional residual network (CoResNet). Firstly, CoResNet is constituted by arranging contextual convolution (CoConv) blocks of different levels to integrate facial expression features with long-range dependency, which generates a holistic representation of spatial information on facial expression. Then, the CA modules are inserted into different stages of CoResNet, at each of which the subtle information about facial expression acquired from CoConv blocks is first modulated by the corresponding CA module across channels and spatial locations and then flows into the next layer. Finally, to highlight facial regions related to expression, a CA module located at the end of the whole network, which produces attentional masks to multiply by input feature maps, is utilized to focus on salient regions. Different from other models, the ACSI-Net is capable of exploring intrinsic dependencies between features and yielding a discriminative representation for facial expression classification. Extensive experimental results on AffectNet and RAF_DB datasets demonstrate its effectiveness and competitiveness compared to other FER methods. MDPI 2022-06-27 /pmc/articles/PMC9324190/ /pubmed/35885106 http://dx.doi.org/10.3390/e24070882 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Li, Xue Zhu, Chunhua Zhou, Fei Facial Expression Recognition: One Attention-Modulated Contextual Spatial Information Network
title	Facial Expression Recognition: One Attention-Modulated Contextual Spatial Information Network
title_full	Facial Expression Recognition: One Attention-Modulated Contextual Spatial Information Network
title_fullStr	Facial Expression Recognition: One Attention-Modulated Contextual Spatial Information Network
title_full_unstemmed	Facial Expression Recognition: One Attention-Modulated Contextual Spatial Information Network
title_short	Facial Expression Recognition: One Attention-Modulated Contextual Spatial Information Network
title_sort	facial expression recognition: one attention-modulated contextual spatial information network
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9324190/ https://www.ncbi.nlm.nih.gov/pubmed/35885106 http://dx.doi.org/10.3390/e24070882
work_keys_str_mv	AT lixue facialexpressionrecognitiononeattentionmodulatedcontextualspatialinformationnetwork AT zhuchunhua facialexpressionrecognitiononeattentionmodulatedcontextualspatialinformationnetwork AT zhoufei facialexpressionrecognitiononeattentionmodulatedcontextualspatialinformationnetwork

Facial Expression Recognition: One Attention-Modulated Contextual Spatial Information Network

Ejemplares similares