Cargando…

Unsupervised Facial Action Representation Learning by Temporal Prediction

Due to the cumbersome and expensive data collection process, facial action unit (AU) datasets are generally much smaller in scale than those in other computer vision fields, resulting in overfitting AU detection models trained on insufficient AU images. Despite the recent progress in AU detection, d...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Chongwen, Wang, Zicheng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8965886/ https://www.ncbi.nlm.nih.gov/pubmed/35370591 http://dx.doi.org/10.3389/fnbot.2022.851847

_version_	1784678533357895680
author	Wang, Chongwen Wang, Zicheng
author_facet	Wang, Chongwen Wang, Zicheng
author_sort	Wang, Chongwen
collection	PubMed
description	Due to the cumbersome and expensive data collection process, facial action unit (AU) datasets are generally much smaller in scale than those in other computer vision fields, resulting in overfitting AU detection models trained on insufficient AU images. Despite the recent progress in AU detection, deployment of these models has been impeded due to their limited generalization to unseen subjects and facial poses. In this paper, we propose to learn the discriminative facial AU representation in a self-supervised manner. Considering that facial AUs show temporal consistency and evolution in consecutive facial frames, we develop a self-supervised pseudo signal based on temporally predictive coding (TPC) to capture the temporal characteristics. To further learn the per-frame discriminativeness between the sibling facial frames, we incorporate the frame-wisely temporal contrastive learning into the self-supervised paradigm naturally. The proposed TPC can be trained without AU annotations, which facilitates us using a large number of unlabeled facial videos to learn the AU representations that are robust to undesired nuisances such as facial identities, poses. Contrary to previous AU detection works, our method does not require manually selecting key facial regions or explicitly modeling the AU relations manually. Experimental results show that TPC improves the AU detection precision on several popular AU benchmark datasets compared with other self-supervised AU detection methods.
format	Online Article Text
id	pubmed-8965886
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-89658862022-03-31 Unsupervised Facial Action Representation Learning by Temporal Prediction Wang, Chongwen Wang, Zicheng Front Neurorobot Neuroscience Due to the cumbersome and expensive data collection process, facial action unit (AU) datasets are generally much smaller in scale than those in other computer vision fields, resulting in overfitting AU detection models trained on insufficient AU images. Despite the recent progress in AU detection, deployment of these models has been impeded due to their limited generalization to unseen subjects and facial poses. In this paper, we propose to learn the discriminative facial AU representation in a self-supervised manner. Considering that facial AUs show temporal consistency and evolution in consecutive facial frames, we develop a self-supervised pseudo signal based on temporally predictive coding (TPC) to capture the temporal characteristics. To further learn the per-frame discriminativeness between the sibling facial frames, we incorporate the frame-wisely temporal contrastive learning into the self-supervised paradigm naturally. The proposed TPC can be trained without AU annotations, which facilitates us using a large number of unlabeled facial videos to learn the AU representations that are robust to undesired nuisances such as facial identities, poses. Contrary to previous AU detection works, our method does not require manually selecting key facial regions or explicitly modeling the AU relations manually. Experimental results show that TPC improves the AU detection precision on several popular AU benchmark datasets compared with other self-supervised AU detection methods. Frontiers Media S.A. 2022-03-16 /pmc/articles/PMC8965886/ /pubmed/35370591 http://dx.doi.org/10.3389/fnbot.2022.851847 Text en Copyright © 2022 Wang and Wang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Wang, Chongwen Wang, Zicheng Unsupervised Facial Action Representation Learning by Temporal Prediction
title	Unsupervised Facial Action Representation Learning by Temporal Prediction
title_full	Unsupervised Facial Action Representation Learning by Temporal Prediction
title_fullStr	Unsupervised Facial Action Representation Learning by Temporal Prediction
title_full_unstemmed	Unsupervised Facial Action Representation Learning by Temporal Prediction
title_short	Unsupervised Facial Action Representation Learning by Temporal Prediction
title_sort	unsupervised facial action representation learning by temporal prediction
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8965886/ https://www.ncbi.nlm.nih.gov/pubmed/35370591 http://dx.doi.org/10.3389/fnbot.2022.851847
work_keys_str_mv	AT wangchongwen unsupervisedfacialactionrepresentationlearningbytemporalprediction AT wangzicheng unsupervisedfacialactionrepresentationlearningbytemporalprediction

Unsupervised Facial Action Representation Learning by Temporal Prediction

Ejemplares similares