Cargando…

Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora

In this paper, we focus on a challenging, but interesting, task in speech emotion recognition (SER), i.e., cross-corpus SER. Unlike conventional SER, a feature distribution mismatch may exist between the labeled source (training) and target (testing) speech samples in cross-corpus SER because they c...

Descripción completa

Detalles Bibliográficos
Autores principales: Zong, Yuan, Lian, Hailun, Chang, Hongli, Lu, Cheng, Tang, Chuangao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9497589/
https://www.ncbi.nlm.nih.gov/pubmed/36141136
http://dx.doi.org/10.3390/e24091250
_version_ 1784794543222161408
author Zong, Yuan
Lian, Hailun
Chang, Hongli
Lu, Cheng
Tang, Chuangao
author_facet Zong, Yuan
Lian, Hailun
Chang, Hongli
Lu, Cheng
Tang, Chuangao
author_sort Zong, Yuan
collection PubMed
description In this paper, we focus on a challenging, but interesting, task in speech emotion recognition (SER), i.e., cross-corpus SER. Unlike conventional SER, a feature distribution mismatch may exist between the labeled source (training) and target (testing) speech samples in cross-corpus SER because they come from different speech emotion corpora, which degrades the performance of most well-performing SER methods. To address this issue, we propose a novel transfer subspace learning method called multiple distribution-adapted regression (MDAR) to bridge the gap between speech samples from different corpora. Specifically, MDAR aims to learn a projection matrix to build the relationship between the source speech features and emotion labels. A novel regularization term called multiple distribution adaption (MDA), consisting of a marginal and two conditional distribution-adapted operations, is designed to collaboratively enable such a discriminative projection matrix to be applicable to the target speech samples, regardless of speech corpus variance. Consequently, by resorting to the learned projection matrix, we are able to predict the emotion labels of target speech samples when only the source label information is given. To evaluate the proposed MDAR method, extensive cross-corpus SER tasks based on three different speech emotion corpora, i.e., EmoDB, eNTERFACE, and CASIA, were designed. Experimental results showed that the proposed MDAR outperformed most recent state-of-the-art transfer subspace learning methods and even performed better than several well-performing deep transfer learning methods in dealing with cross-corpus SER tasks.
format Online
Article
Text
id pubmed-9497589
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-94975892022-09-23 Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora Zong, Yuan Lian, Hailun Chang, Hongli Lu, Cheng Tang, Chuangao Entropy (Basel) Article In this paper, we focus on a challenging, but interesting, task in speech emotion recognition (SER), i.e., cross-corpus SER. Unlike conventional SER, a feature distribution mismatch may exist between the labeled source (training) and target (testing) speech samples in cross-corpus SER because they come from different speech emotion corpora, which degrades the performance of most well-performing SER methods. To address this issue, we propose a novel transfer subspace learning method called multiple distribution-adapted regression (MDAR) to bridge the gap between speech samples from different corpora. Specifically, MDAR aims to learn a projection matrix to build the relationship between the source speech features and emotion labels. A novel regularization term called multiple distribution adaption (MDA), consisting of a marginal and two conditional distribution-adapted operations, is designed to collaboratively enable such a discriminative projection matrix to be applicable to the target speech samples, regardless of speech corpus variance. Consequently, by resorting to the learned projection matrix, we are able to predict the emotion labels of target speech samples when only the source label information is given. To evaluate the proposed MDAR method, extensive cross-corpus SER tasks based on three different speech emotion corpora, i.e., EmoDB, eNTERFACE, and CASIA, were designed. Experimental results showed that the proposed MDAR outperformed most recent state-of-the-art transfer subspace learning methods and even performed better than several well-performing deep transfer learning methods in dealing with cross-corpus SER tasks. MDPI 2022-09-05 /pmc/articles/PMC9497589/ /pubmed/36141136 http://dx.doi.org/10.3390/e24091250 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zong, Yuan
Lian, Hailun
Chang, Hongli
Lu, Cheng
Tang, Chuangao
Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora
title Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora
title_full Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora
title_fullStr Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora
title_full_unstemmed Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora
title_short Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora
title_sort adapting multiple distributions for bridging emotions from different speech corpora
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9497589/
https://www.ncbi.nlm.nih.gov/pubmed/36141136
http://dx.doi.org/10.3390/e24091250
work_keys_str_mv AT zongyuan adaptingmultipledistributionsforbridgingemotionsfromdifferentspeechcorpora
AT lianhailun adaptingmultipledistributionsforbridgingemotionsfromdifferentspeechcorpora
AT changhongli adaptingmultipledistributionsforbridgingemotionsfromdifferentspeechcorpora
AT lucheng adaptingmultipledistributionsforbridgingemotionsfromdifferentspeechcorpora
AT tangchuangao adaptingmultipledistributionsforbridgingemotionsfromdifferentspeechcorpora