Cargando…

Articulation constrained learning with application to speech emotion recognition

Speech emotion recognition methods combining articulatory information with acoustic features have been previously shown to improve recognition performance. Collection of articulatory data on a large scale may not be feasible in many scenarios, thus restricting the scope and applicability of such met...

Descripción completa

Detalles Bibliográficos
Autores principales: Shah, Mohit, Tu, Ming, Berisha, Visar, Chakrabarti, Chaitali, Spanias, Andreas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6919554/
https://www.ncbi.nlm.nih.gov/pubmed/31853252
http://dx.doi.org/10.1186/s13636-019-0157-9
_version_ 1783480774518898688
author Shah, Mohit
Tu, Ming
Berisha, Visar
Chakrabarti, Chaitali
Spanias, Andreas
author_facet Shah, Mohit
Tu, Ming
Berisha, Visar
Chakrabarti, Chaitali
Spanias, Andreas
author_sort Shah, Mohit
collection PubMed
description Speech emotion recognition methods combining articulatory information with acoustic features have been previously shown to improve recognition performance. Collection of articulatory data on a large scale may not be feasible in many scenarios, thus restricting the scope and applicability of such methods. In this paper, a discriminative learning method for emotion recognition using both articulatory and acoustic information is proposed. A traditional ℓ(1)-regularized logistic regression cost function is extended to include additional constraints that enforce the model to reconstruct articulatory data. This leads to sparse and interpretable representations jointly optimized for both tasks simultaneously. Furthermore, the model only requires articulatory features during training; only speech features are required for inference on out-of-sample data. Experiments are conducted to evaluate emotion recognition performance over vowels /AA/, /AE/, /IY/, /UW/ and complete utterances. Incorporating articulatory information is shown to significantly improve the performance for valence-based classification. Results obtained for within-corpus and cross-corpus categorical emotion recognition indicate that the proposed method is more effective at distinguishing happiness from other emotions.
format Online
Article
Text
id pubmed-6919554
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-69195542019-12-18 Articulation constrained learning with application to speech emotion recognition Shah, Mohit Tu, Ming Berisha, Visar Chakrabarti, Chaitali Spanias, Andreas EURASIP J Audio Speech Music Process Research Speech emotion recognition methods combining articulatory information with acoustic features have been previously shown to improve recognition performance. Collection of articulatory data on a large scale may not be feasible in many scenarios, thus restricting the scope and applicability of such methods. In this paper, a discriminative learning method for emotion recognition using both articulatory and acoustic information is proposed. A traditional ℓ(1)-regularized logistic regression cost function is extended to include additional constraints that enforce the model to reconstruct articulatory data. This leads to sparse and interpretable representations jointly optimized for both tasks simultaneously. Furthermore, the model only requires articulatory features during training; only speech features are required for inference on out-of-sample data. Experiments are conducted to evaluate emotion recognition performance over vowels /AA/, /AE/, /IY/, /UW/ and complete utterances. Incorporating articulatory information is shown to significantly improve the performance for valence-based classification. Results obtained for within-corpus and cross-corpus categorical emotion recognition indicate that the proposed method is more effective at distinguishing happiness from other emotions. Springer International Publishing 2019-08-20 2019 /pmc/articles/PMC6919554/ /pubmed/31853252 http://dx.doi.org/10.1186/s13636-019-0157-9 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Research
Shah, Mohit
Tu, Ming
Berisha, Visar
Chakrabarti, Chaitali
Spanias, Andreas
Articulation constrained learning with application to speech emotion recognition
title Articulation constrained learning with application to speech emotion recognition
title_full Articulation constrained learning with application to speech emotion recognition
title_fullStr Articulation constrained learning with application to speech emotion recognition
title_full_unstemmed Articulation constrained learning with application to speech emotion recognition
title_short Articulation constrained learning with application to speech emotion recognition
title_sort articulation constrained learning with application to speech emotion recognition
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6919554/
https://www.ncbi.nlm.nih.gov/pubmed/31853252
http://dx.doi.org/10.1186/s13636-019-0157-9
work_keys_str_mv AT shahmohit articulationconstrainedlearningwithapplicationtospeechemotionrecognition
AT tuming articulationconstrainedlearningwithapplicationtospeechemotionrecognition
AT berishavisar articulationconstrainedlearningwithapplicationtospeechemotionrecognition
AT chakrabartichaitali articulationconstrainedlearningwithapplicationtospeechemotionrecognition
AT spaniasandreas articulationconstrainedlearningwithapplicationtospeechemotionrecognition