Cargando…
Articulation constrained learning with application to speech emotion recognition
Speech emotion recognition methods combining articulatory information with acoustic features have been previously shown to improve recognition performance. Collection of articulatory data on a large scale may not be feasible in many scenarios, thus restricting the scope and applicability of such met...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6919554/ https://www.ncbi.nlm.nih.gov/pubmed/31853252 http://dx.doi.org/10.1186/s13636-019-0157-9 |
_version_ | 1783480774518898688 |
---|---|
author | Shah, Mohit Tu, Ming Berisha, Visar Chakrabarti, Chaitali Spanias, Andreas |
author_facet | Shah, Mohit Tu, Ming Berisha, Visar Chakrabarti, Chaitali Spanias, Andreas |
author_sort | Shah, Mohit |
collection | PubMed |
description | Speech emotion recognition methods combining articulatory information with acoustic features have been previously shown to improve recognition performance. Collection of articulatory data on a large scale may not be feasible in many scenarios, thus restricting the scope and applicability of such methods. In this paper, a discriminative learning method for emotion recognition using both articulatory and acoustic information is proposed. A traditional ℓ(1)-regularized logistic regression cost function is extended to include additional constraints that enforce the model to reconstruct articulatory data. This leads to sparse and interpretable representations jointly optimized for both tasks simultaneously. Furthermore, the model only requires articulatory features during training; only speech features are required for inference on out-of-sample data. Experiments are conducted to evaluate emotion recognition performance over vowels /AA/, /AE/, /IY/, /UW/ and complete utterances. Incorporating articulatory information is shown to significantly improve the performance for valence-based classification. Results obtained for within-corpus and cross-corpus categorical emotion recognition indicate that the proposed method is more effective at distinguishing happiness from other emotions. |
format | Online Article Text |
id | pubmed-6919554 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-69195542019-12-18 Articulation constrained learning with application to speech emotion recognition Shah, Mohit Tu, Ming Berisha, Visar Chakrabarti, Chaitali Spanias, Andreas EURASIP J Audio Speech Music Process Research Speech emotion recognition methods combining articulatory information with acoustic features have been previously shown to improve recognition performance. Collection of articulatory data on a large scale may not be feasible in many scenarios, thus restricting the scope and applicability of such methods. In this paper, a discriminative learning method for emotion recognition using both articulatory and acoustic information is proposed. A traditional ℓ(1)-regularized logistic regression cost function is extended to include additional constraints that enforce the model to reconstruct articulatory data. This leads to sparse and interpretable representations jointly optimized for both tasks simultaneously. Furthermore, the model only requires articulatory features during training; only speech features are required for inference on out-of-sample data. Experiments are conducted to evaluate emotion recognition performance over vowels /AA/, /AE/, /IY/, /UW/ and complete utterances. Incorporating articulatory information is shown to significantly improve the performance for valence-based classification. Results obtained for within-corpus and cross-corpus categorical emotion recognition indicate that the proposed method is more effective at distinguishing happiness from other emotions. Springer International Publishing 2019-08-20 2019 /pmc/articles/PMC6919554/ /pubmed/31853252 http://dx.doi.org/10.1186/s13636-019-0157-9 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. |
spellingShingle | Research Shah, Mohit Tu, Ming Berisha, Visar Chakrabarti, Chaitali Spanias, Andreas Articulation constrained learning with application to speech emotion recognition |
title | Articulation constrained learning with application to speech emotion recognition |
title_full | Articulation constrained learning with application to speech emotion recognition |
title_fullStr | Articulation constrained learning with application to speech emotion recognition |
title_full_unstemmed | Articulation constrained learning with application to speech emotion recognition |
title_short | Articulation constrained learning with application to speech emotion recognition |
title_sort | articulation constrained learning with application to speech emotion recognition |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6919554/ https://www.ncbi.nlm.nih.gov/pubmed/31853252 http://dx.doi.org/10.1186/s13636-019-0157-9 |
work_keys_str_mv | AT shahmohit articulationconstrainedlearningwithapplicationtospeechemotionrecognition AT tuming articulationconstrainedlearningwithapplicationtospeechemotionrecognition AT berishavisar articulationconstrainedlearningwithapplicationtospeechemotionrecognition AT chakrabartichaitali articulationconstrainedlearningwithapplicationtospeechemotionrecognition AT spaniasandreas articulationconstrainedlearningwithapplicationtospeechemotionrecognition |