Cargando…

Progressive Multi-Scale Vision Transformer for Facial Action Unit Detection

Facial action unit (AU) detection is an important task in affective computing and has attracted extensive attention in the field of computer vision and artificial intelligence. Previous studies for AU detection usually encode complex regional feature representations with manually defined facial land...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Chongwen, Wang, Zicheng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8790567/ https://www.ncbi.nlm.nih.gov/pubmed/35095460 http://dx.doi.org/10.3389/fnbot.2021.824592

_version_	1784640044145836032
author	Wang, Chongwen Wang, Zicheng
author_facet	Wang, Chongwen Wang, Zicheng
author_sort	Wang, Chongwen
collection	PubMed
description	Facial action unit (AU) detection is an important task in affective computing and has attracted extensive attention in the field of computer vision and artificial intelligence. Previous studies for AU detection usually encode complex regional feature representations with manually defined facial landmarks and learn to model the relationships among AUs via graph neural network. Albeit some progress has been achieved, it is still tedious for existing methods to capture the exclusive and concurrent relationships among different combinations of the facial AUs. To circumvent this issue, we proposed a new progressive multi-scale vision transformer (PMVT) to capture the complex relationships among different AUs for the wide range of expressions in a data-driven fashion. PMVT is based on the multi-scale self-attention mechanism that can flexibly attend to a sequence of image patches to encode the critical cues for AUs. Compared with previous AU detection methods, the benefits of PMVT are 2-fold: (i) PMVT does not rely on manually defined facial landmarks to extract the regional representations, and (ii) PMVT is capable of encoding facial regions with adaptive receptive fields, thus facilitating representation of different AU flexibly. Experimental results show that PMVT improves the AU detection accuracy on the popular BP4D and DISFA datasets. Compared with other state-of-the-art AU detection methods, PMVT obtains consistent improvements. Visualization results show PMVT automatically perceives the discriminative facial regions for robust AU detection.
format	Online Article Text
id	pubmed-8790567
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-87905672022-01-27 Progressive Multi-Scale Vision Transformer for Facial Action Unit Detection Wang, Chongwen Wang, Zicheng Front Neurorobot Neuroscience Facial action unit (AU) detection is an important task in affective computing and has attracted extensive attention in the field of computer vision and artificial intelligence. Previous studies for AU detection usually encode complex regional feature representations with manually defined facial landmarks and learn to model the relationships among AUs via graph neural network. Albeit some progress has been achieved, it is still tedious for existing methods to capture the exclusive and concurrent relationships among different combinations of the facial AUs. To circumvent this issue, we proposed a new progressive multi-scale vision transformer (PMVT) to capture the complex relationships among different AUs for the wide range of expressions in a data-driven fashion. PMVT is based on the multi-scale self-attention mechanism that can flexibly attend to a sequence of image patches to encode the critical cues for AUs. Compared with previous AU detection methods, the benefits of PMVT are 2-fold: (i) PMVT does not rely on manually defined facial landmarks to extract the regional representations, and (ii) PMVT is capable of encoding facial regions with adaptive receptive fields, thus facilitating representation of different AU flexibly. Experimental results show that PMVT improves the AU detection accuracy on the popular BP4D and DISFA datasets. Compared with other state-of-the-art AU detection methods, PMVT obtains consistent improvements. Visualization results show PMVT automatically perceives the discriminative facial regions for robust AU detection. Frontiers Media S.A. 2022-01-12 /pmc/articles/PMC8790567/ /pubmed/35095460 http://dx.doi.org/10.3389/fnbot.2021.824592 Text en Copyright © 2022 Wang and Wang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Wang, Chongwen Wang, Zicheng Progressive Multi-Scale Vision Transformer for Facial Action Unit Detection
title	Progressive Multi-Scale Vision Transformer for Facial Action Unit Detection
title_full	Progressive Multi-Scale Vision Transformer for Facial Action Unit Detection
title_fullStr	Progressive Multi-Scale Vision Transformer for Facial Action Unit Detection
title_full_unstemmed	Progressive Multi-Scale Vision Transformer for Facial Action Unit Detection
title_short	Progressive Multi-Scale Vision Transformer for Facial Action Unit Detection
title_sort	progressive multi-scale vision transformer for facial action unit detection
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8790567/ https://www.ncbi.nlm.nih.gov/pubmed/35095460 http://dx.doi.org/10.3389/fnbot.2021.824592
work_keys_str_mv	AT wangchongwen progressivemultiscalevisiontransformerforfacialactionunitdetection AT wangzicheng progressivemultiscalevisiontransformerforfacialactionunitdetection

Progressive Multi-Scale Vision Transformer for Facial Action Unit Detection

Ejemplares similares