Cargando…

End-to-End Training for Compound Expression Recognition

For a long time, expressions have been something that human beings are proud of. That is an essential difference between us and machines. With the development of computers, we are more eager to develop communication between humans and machines, especially communication with emotions. The emotional g...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Hongfei, Li, Qing
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7506941/ https://www.ncbi.nlm.nih.gov/pubmed/32825666 http://dx.doi.org/10.3390/s20174727

_version_	1783585128421785600
author	Li, Hongfei Li, Qing
author_facet	Li, Hongfei Li, Qing
author_sort	Li, Hongfei
collection	PubMed
description	For a long time, expressions have been something that human beings are proud of. That is an essential difference between us and machines. With the development of computers, we are more eager to develop communication between humans and machines, especially communication with emotions. The emotional growth of computers is similar to the growth process of each of us, starting with a natural, intimate, and vivid interaction by observing and discerning emotions. Since the basic emotions, angry, disgusted, fearful, happy, neutral, sad and surprised are put forward, there are many researches based on basic emotions at present, but few on compound emotions. However, in real life, people’s emotions are complex. Single expressions cannot fully and accurately show people’s inner emotional changes, thus, exploration of compound expression recognition is very essential to daily life. In this paper, we recommend a scheme of combining spatial and frequency domain transform to implement end-to-end joint training based on model ensembling between models for appearance and geometric representations learning for the recognition of compound expressions in the wild. We are mainly devoted to digging the appearance and geometric information based on deep learning models. For appearance feature acquisition, we adopt the idea of transfer learning, introducing the ResNet50 model pretrained on VGGFace2 for face recognition to implement the fine-tuning process. Here, we try and compare two minds, one is that we utilize two static expression databases FER2013 and RAF Basic for basic emotion recognition to fine tune, the other is that we fine tune the model on the input three channels composed of images generated by DWT2 and WAVEDEC2 wavelet transforms based on rbio3.1 and sym1 wavelet bases respectively. For geometric feature acquisition, we firstly introduce a densesift operator to extract facial key points and their histogram descriptions. After that, we introduce deep SAE with a softmax function, stacked LSTM and Sequence-to-Sequence with stacked LSTM and define their structures by ourselves. Then, we feed the salient key points and their descriptions into three models to train respectively and compare their performances. When the model training for appearance and geometric features learning is completed, we combine the two models with category labels to achieve further end-to-end joint training, considering that ensembling models, which describe different information, can further improve recognition results. Finally, we validate the performance of our proposed framework on an RAF Compound database and achieve a recognition rate of 66.97%. Experiments show that integrating different models, which express different information, and achieving end-to-end training can quickly and effectively improve the performance of the recognition.
format	Online Article Text
id	pubmed-7506941
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-75069412020-09-30 End-to-End Training for Compound Expression Recognition Li, Hongfei Li, Qing Sensors (Basel) Article For a long time, expressions have been something that human beings are proud of. That is an essential difference between us and machines. With the development of computers, we are more eager to develop communication between humans and machines, especially communication with emotions. The emotional growth of computers is similar to the growth process of each of us, starting with a natural, intimate, and vivid interaction by observing and discerning emotions. Since the basic emotions, angry, disgusted, fearful, happy, neutral, sad and surprised are put forward, there are many researches based on basic emotions at present, but few on compound emotions. However, in real life, people’s emotions are complex. Single expressions cannot fully and accurately show people’s inner emotional changes, thus, exploration of compound expression recognition is very essential to daily life. In this paper, we recommend a scheme of combining spatial and frequency domain transform to implement end-to-end joint training based on model ensembling between models for appearance and geometric representations learning for the recognition of compound expressions in the wild. We are mainly devoted to digging the appearance and geometric information based on deep learning models. For appearance feature acquisition, we adopt the idea of transfer learning, introducing the ResNet50 model pretrained on VGGFace2 for face recognition to implement the fine-tuning process. Here, we try and compare two minds, one is that we utilize two static expression databases FER2013 and RAF Basic for basic emotion recognition to fine tune, the other is that we fine tune the model on the input three channels composed of images generated by DWT2 and WAVEDEC2 wavelet transforms based on rbio3.1 and sym1 wavelet bases respectively. For geometric feature acquisition, we firstly introduce a densesift operator to extract facial key points and their histogram descriptions. After that, we introduce deep SAE with a softmax function, stacked LSTM and Sequence-to-Sequence with stacked LSTM and define their structures by ourselves. Then, we feed the salient key points and their descriptions into three models to train respectively and compare their performances. When the model training for appearance and geometric features learning is completed, we combine the two models with category labels to achieve further end-to-end joint training, considering that ensembling models, which describe different information, can further improve recognition results. Finally, we validate the performance of our proposed framework on an RAF Compound database and achieve a recognition rate of 66.97%. Experiments show that integrating different models, which express different information, and achieving end-to-end training can quickly and effectively improve the performance of the recognition. MDPI 2020-08-21 /pmc/articles/PMC7506941/ /pubmed/32825666 http://dx.doi.org/10.3390/s20174727 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Li, Hongfei Li, Qing End-to-End Training for Compound Expression Recognition
title	End-to-End Training for Compound Expression Recognition
title_full	End-to-End Training for Compound Expression Recognition
title_fullStr	End-to-End Training for Compound Expression Recognition
title_full_unstemmed	End-to-End Training for Compound Expression Recognition
title_short	End-to-End Training for Compound Expression Recognition
title_sort	end-to-end training for compound expression recognition
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7506941/ https://www.ncbi.nlm.nih.gov/pubmed/32825666 http://dx.doi.org/10.3390/s20174727
work_keys_str_mv	AT lihongfei endtoendtrainingforcompoundexpressionrecognition AT liqing endtoendtrainingforcompoundexpressionrecognition

End-to-End Training for Compound Expression Recognition

Ejemplares similares