Cargando…

Affective Latent Representation of Acoustic and Lexical Features for Emotion Recognition

In this paper, we propose a novel emotion recognition method based on the underlying emotional characteristics extracted from a conditional adversarial auto-encoder (CAAE), in which both acoustic and lexical features are used as inputs. The acoustic features are generated by calculating statistical...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Eesung, Song, Hyungchan, Shin, Jong Won
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7248815/
https://www.ncbi.nlm.nih.gov/pubmed/32375342
http://dx.doi.org/10.3390/s20092614
_version_ 1783538458199851008
author Kim, Eesung
Song, Hyungchan
Shin, Jong Won
author_facet Kim, Eesung
Song, Hyungchan
Shin, Jong Won
author_sort Kim, Eesung
collection PubMed
description In this paper, we propose a novel emotion recognition method based on the underlying emotional characteristics extracted from a conditional adversarial auto-encoder (CAAE), in which both acoustic and lexical features are used as inputs. The acoustic features are generated by calculating statistical functionals of low-level descriptors and by a deep neural network (DNN). These acoustic features are concatenated with three types of lexical features extracted from the text, which are a sparse representation, a distributed representation, and an affective lexicon-based dimensions. Two-dimensional latent representations similar to vectors in the valence-arousal space are obtained by a CAAE, which can be directly mapped into the emotional classes without the need for a sophisticated classifier. In contrast to the previous attempt to a CAAE using only acoustic features, the proposed approach could enhance the performance of the emotion recognition because combined acoustic and lexical features provide enough discriminant power. Experimental results on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus showed that our method outperformed the previously reported best results on the same corpus, achieving 76.72% in the unweighted average recall.
format Online
Article
Text
id pubmed-7248815
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-72488152020-06-10 Affective Latent Representation of Acoustic and Lexical Features for Emotion Recognition Kim, Eesung Song, Hyungchan Shin, Jong Won Sensors (Basel) Article In this paper, we propose a novel emotion recognition method based on the underlying emotional characteristics extracted from a conditional adversarial auto-encoder (CAAE), in which both acoustic and lexical features are used as inputs. The acoustic features are generated by calculating statistical functionals of low-level descriptors and by a deep neural network (DNN). These acoustic features are concatenated with three types of lexical features extracted from the text, which are a sparse representation, a distributed representation, and an affective lexicon-based dimensions. Two-dimensional latent representations similar to vectors in the valence-arousal space are obtained by a CAAE, which can be directly mapped into the emotional classes without the need for a sophisticated classifier. In contrast to the previous attempt to a CAAE using only acoustic features, the proposed approach could enhance the performance of the emotion recognition because combined acoustic and lexical features provide enough discriminant power. Experimental results on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus showed that our method outperformed the previously reported best results on the same corpus, achieving 76.72% in the unweighted average recall. MDPI 2020-05-04 /pmc/articles/PMC7248815/ /pubmed/32375342 http://dx.doi.org/10.3390/s20092614 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kim, Eesung
Song, Hyungchan
Shin, Jong Won
Affective Latent Representation of Acoustic and Lexical Features for Emotion Recognition
title Affective Latent Representation of Acoustic and Lexical Features for Emotion Recognition
title_full Affective Latent Representation of Acoustic and Lexical Features for Emotion Recognition
title_fullStr Affective Latent Representation of Acoustic and Lexical Features for Emotion Recognition
title_full_unstemmed Affective Latent Representation of Acoustic and Lexical Features for Emotion Recognition
title_short Affective Latent Representation of Acoustic and Lexical Features for Emotion Recognition
title_sort affective latent representation of acoustic and lexical features for emotion recognition
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7248815/
https://www.ncbi.nlm.nih.gov/pubmed/32375342
http://dx.doi.org/10.3390/s20092614
work_keys_str_mv AT kimeesung affectivelatentrepresentationofacousticandlexicalfeaturesforemotionrecognition
AT songhyungchan affectivelatentrepresentationofacousticandlexicalfeaturesforemotionrecognition
AT shinjongwon affectivelatentrepresentationofacousticandlexicalfeaturesforemotionrecognition