Cargando…

Face-mask-aware Facial Expression Recognition based on Face Parsing and Vision Transformer

As wearing face masks is becoming an embedded practice due to the COVID-19 pandemic, facial expression recognition (FER) that takes face masks into account is now a problem that needs to be solved. In this paper, we propose a face parsing and vision Transformer-based method to improve the accuracy o...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Bo, Wu, Jianming, Ikeda, Kazushi, Hattori, Gen, Sugano, Masaru, Iwasawa, Yusuke, Matsuo, Yutaka
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier B.V. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9645067/
https://www.ncbi.nlm.nih.gov/pubmed/36407855
http://dx.doi.org/10.1016/j.patrec.2022.11.004
Descripción
Sumario:As wearing face masks is becoming an embedded practice due to the COVID-19 pandemic, facial expression recognition (FER) that takes face masks into account is now a problem that needs to be solved. In this paper, we propose a face parsing and vision Transformer-based method to improve the accuracy of face-mask-aware FER. First, in order to improve the precision of distinguishing the unobstructed facial region as well as those parts of the face covered by a mask, we re-train a face-mask-aware face parsing model, based on the existing face parsing dataset automatically relabeled with a face mask and pixel label. Second, we propose a vision Transformer with a cross attention mechanism-based FER classifier, capable of taking both occluded and non-occluded facial regions into account and reweigh these two parts automatically to get the best facial expression recognition performance. The proposed method outperforms existing state-of-the-art face-mask-aware FER methods, as well as other occlusion-aware FER methods, on two datasets that contain three kinds of emotions (M-LFW-FER and M-KDDI-FER datasets) and two datasets that contain seven kinds of emotions (M-FER-2013 and M-CK+ datasets).