Cargando…

Progressive distribution adapted neural networks for cross-corpus speech emotion recognition

In this paper, we investigate a challenging but interesting task in the research of speech emotion recognition (SER), i.e., cross-corpus SER. Unlike the conventional SER, the training (source) and testing (target) samples in cross-corpus SER come from different speech corpora, which results in a fea...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zong, Yuan, Lian, Hailun, Zhang, Jiacheng, Feng, Ercui, Lu, Cheng, Chang, Hongli, Tang, Chuangao
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9520908/ https://www.ncbi.nlm.nih.gov/pubmed/36187564 http://dx.doi.org/10.3389/fnbot.2022.987146

_version_	1784799729338548224
author	Zong, Yuan Lian, Hailun Zhang, Jiacheng Feng, Ercui Lu, Cheng Chang, Hongli Tang, Chuangao
author_facet	Zong, Yuan Lian, Hailun Zhang, Jiacheng Feng, Ercui Lu, Cheng Chang, Hongli Tang, Chuangao
author_sort	Zong, Yuan
collection	PubMed
description	In this paper, we investigate a challenging but interesting task in the research of speech emotion recognition (SER), i.e., cross-corpus SER. Unlike the conventional SER, the training (source) and testing (target) samples in cross-corpus SER come from different speech corpora, which results in a feature distribution mismatch between them. Hence, the performance of most existing SER methods may sharply decrease. To cope with this problem, we propose a simple yet effective deep transfer learning method called progressive distribution adapted neural networks (PDAN). PDAN employs convolutional neural networks (CNN) as the backbone and the speech spectrum as the inputs to achieve an end-to-end learning framework. More importantly, its basic idea for solving cross-corpus SER is very straightforward, i.e., enhancing the backbone's corpus invariant feature learning ability by incorporating a progressive distribution adapted regularization term into the original loss function to guide the network training. To evaluate the proposed PDAN, extensive cross-corpus SER experiments on speech emotion corpora including EmoDB, eNTERFACE, and CASIA are conducted. Experimental results showed that the proposed PDAN outperforms most well-performing deep and subspace transfer learning methods in dealing with the cross-corpus SER tasks.
format	Online Article Text
id	pubmed-9520908
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-95209082022-09-30 Progressive distribution adapted neural networks for cross-corpus speech emotion recognition Zong, Yuan Lian, Hailun Zhang, Jiacheng Feng, Ercui Lu, Cheng Chang, Hongli Tang, Chuangao Front Neurorobot Neuroscience In this paper, we investigate a challenging but interesting task in the research of speech emotion recognition (SER), i.e., cross-corpus SER. Unlike the conventional SER, the training (source) and testing (target) samples in cross-corpus SER come from different speech corpora, which results in a feature distribution mismatch between them. Hence, the performance of most existing SER methods may sharply decrease. To cope with this problem, we propose a simple yet effective deep transfer learning method called progressive distribution adapted neural networks (PDAN). PDAN employs convolutional neural networks (CNN) as the backbone and the speech spectrum as the inputs to achieve an end-to-end learning framework. More importantly, its basic idea for solving cross-corpus SER is very straightforward, i.e., enhancing the backbone's corpus invariant feature learning ability by incorporating a progressive distribution adapted regularization term into the original loss function to guide the network training. To evaluate the proposed PDAN, extensive cross-corpus SER experiments on speech emotion corpora including EmoDB, eNTERFACE, and CASIA are conducted. Experimental results showed that the proposed PDAN outperforms most well-performing deep and subspace transfer learning methods in dealing with the cross-corpus SER tasks. Frontiers Media S.A. 2022-09-15 /pmc/articles/PMC9520908/ /pubmed/36187564 http://dx.doi.org/10.3389/fnbot.2022.987146 Text en Copyright © 2022 Zong, Lian, Zhang, Feng, Lu, Chang and Tang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Zong, Yuan Lian, Hailun Zhang, Jiacheng Feng, Ercui Lu, Cheng Chang, Hongli Tang, Chuangao Progressive distribution adapted neural networks for cross-corpus speech emotion recognition
title	Progressive distribution adapted neural networks for cross-corpus speech emotion recognition
title_full	Progressive distribution adapted neural networks for cross-corpus speech emotion recognition
title_fullStr	Progressive distribution adapted neural networks for cross-corpus speech emotion recognition
title_full_unstemmed	Progressive distribution adapted neural networks for cross-corpus speech emotion recognition
title_short	Progressive distribution adapted neural networks for cross-corpus speech emotion recognition
title_sort	progressive distribution adapted neural networks for cross-corpus speech emotion recognition
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9520908/ https://www.ncbi.nlm.nih.gov/pubmed/36187564 http://dx.doi.org/10.3389/fnbot.2022.987146
work_keys_str_mv	AT zongyuan progressivedistributionadaptedneuralnetworksforcrosscorpusspeechemotionrecognition AT lianhailun progressivedistributionadaptedneuralnetworksforcrosscorpusspeechemotionrecognition AT zhangjiacheng progressivedistributionadaptedneuralnetworksforcrosscorpusspeechemotionrecognition AT fengercui progressivedistributionadaptedneuralnetworksforcrosscorpusspeechemotionrecognition AT lucheng progressivedistributionadaptedneuralnetworksforcrosscorpusspeechemotionrecognition AT changhongli progressivedistributionadaptedneuralnetworksforcrosscorpusspeechemotionrecognition AT tangchuangao progressivedistributionadaptedneuralnetworksforcrosscorpusspeechemotionrecognition

Progressive distribution adapted neural networks for cross-corpus speech emotion recognition

Ejemplares similares