Cargando…

Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation

To solve the problem of feature distribution discrepancy in cross-corpus speech emotion recognition tasks, this paper proposed an emotion recognition model based on multi-task learning and subdomain adaptation, which alleviates the impact on emotion recognition. Existing methods have shortcomings in...

Descripción completa

Detalles Bibliográficos
Autores principales:	Fu, Hongliang, Zhuang, Zhihao, Wang, Yang, Huang, Chen, Duan, Wenzhuo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9858266/ https://www.ncbi.nlm.nih.gov/pubmed/36673265 http://dx.doi.org/10.3390/e25010124

_version_	1784874056192884736
author	Fu, Hongliang Zhuang, Zhihao Wang, Yang Huang, Chen Duan, Wenzhuo
author_facet	Fu, Hongliang Zhuang, Zhihao Wang, Yang Huang, Chen Duan, Wenzhuo
author_sort	Fu, Hongliang
collection	PubMed
description	To solve the problem of feature distribution discrepancy in cross-corpus speech emotion recognition tasks, this paper proposed an emotion recognition model based on multi-task learning and subdomain adaptation, which alleviates the impact on emotion recognition. Existing methods have shortcomings in speech feature representation and cross-corpus feature distribution alignment. The proposed model uses a deep denoising auto-encoder as a shared feature extraction network for multi-task learning, and the fully connected layer and softmax layer are added before each recognition task as task-specific layers. Subsequently, the subdomain adaptation algorithm of emotion and gender features is added to the shared network to obtain the shared emotion features and gender features of the source domain and target domain, respectively. Multi-task learning effectively enhances the representation ability of features, a subdomain adaptive algorithm promotes the migrating ability of features and effectively alleviates the impact of feature distribution differences in emotional features. The average results of six cross-corpus speech emotion recognition experiments show that, compared with other models, the weighted average recall rate is increased by 1.89~10.07%, the experimental results verify the validity of the proposed model.
format	Online Article Text
id	pubmed-9858266
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-98582662023-01-21 Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation Fu, Hongliang Zhuang, Zhihao Wang, Yang Huang, Chen Duan, Wenzhuo Entropy (Basel) Article To solve the problem of feature distribution discrepancy in cross-corpus speech emotion recognition tasks, this paper proposed an emotion recognition model based on multi-task learning and subdomain adaptation, which alleviates the impact on emotion recognition. Existing methods have shortcomings in speech feature representation and cross-corpus feature distribution alignment. The proposed model uses a deep denoising auto-encoder as a shared feature extraction network for multi-task learning, and the fully connected layer and softmax layer are added before each recognition task as task-specific layers. Subsequently, the subdomain adaptation algorithm of emotion and gender features is added to the shared network to obtain the shared emotion features and gender features of the source domain and target domain, respectively. Multi-task learning effectively enhances the representation ability of features, a subdomain adaptive algorithm promotes the migrating ability of features and effectively alleviates the impact of feature distribution differences in emotional features. The average results of six cross-corpus speech emotion recognition experiments show that, compared with other models, the weighted average recall rate is increased by 1.89~10.07%, the experimental results verify the validity of the proposed model. MDPI 2023-01-07 /pmc/articles/PMC9858266/ /pubmed/36673265 http://dx.doi.org/10.3390/e25010124 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Fu, Hongliang Zhuang, Zhihao Wang, Yang Huang, Chen Duan, Wenzhuo Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
title	Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
title_full	Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
title_fullStr	Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
title_full_unstemmed	Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
title_short	Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
title_sort	cross-corpus speech emotion recognition based on multi-task learning and subdomain adaptation
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9858266/ https://www.ncbi.nlm.nih.gov/pubmed/36673265 http://dx.doi.org/10.3390/e25010124
work_keys_str_mv	AT fuhongliang crosscorpusspeechemotionrecognitionbasedonmultitasklearningandsubdomainadaptation AT zhuangzhihao crosscorpusspeechemotionrecognitionbasedonmultitasklearningandsubdomainadaptation AT wangyang crosscorpusspeechemotionrecognitionbasedonmultitasklearningandsubdomainadaptation AT huangchen crosscorpusspeechemotionrecognitionbasedonmultitasklearningandsubdomainadaptation AT duanwenzhuo crosscorpusspeechemotionrecognitionbasedonmultitasklearningandsubdomainadaptation

Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation

Ejemplares similares