Cargando…

Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN

Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhu, Lianzhang, Chen, Leiming, Zhao, Dehai, Zhou, Jiehan, Zhang, Weishan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5539696/
https://www.ncbi.nlm.nih.gov/pubmed/28737705
http://dx.doi.org/10.3390/s17071694
_version_ 1783254530941517824
author Zhu, Lianzhang
Chen, Leiming
Zhao, Dehai
Zhou, Jiehan
Zhang, Weishan
author_facet Zhu, Lianzhang
Chen, Leiming
Zhao, Dehai
Zhou, Jiehan
Zhang, Weishan
author_sort Zhu, Lianzhang
collection PubMed
description Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve the accuracy of speech emotion recognition, including speech signal feature extraction and emotion classification methods. Five types of features are extracted from a speech sample: mel frequency cepstrum coefficient (MFCC), pitch, formant, short-term zero-crossing rate and short-term energy. By comparing statistical features with deep features extracted by a Deep Belief Network (DBN), we attempt to find the best features to identify the emotion status for speech. We propose a novel classification method that combines DBN and SVM (support vector machine) instead of using only one of them. In addition, a conjugate gradient method is applied to train DBN in order to speed up the training process. Gender-dependent experiments are conducted using an emotional speech database created by the Chinese Academy of Sciences. The results show that DBN features can reflect emotion status better than artificial features, and our new classification approach achieves an accuracy of 95.8%, which is higher than using either DBN or SVM separately. Results also show that DBN can work very well for small training databases if it is properly designed.
format Online
Article
Text
id pubmed-5539696
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-55396962017-08-11 Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN Zhu, Lianzhang Chen, Leiming Zhao, Dehai Zhou, Jiehan Zhang, Weishan Sensors (Basel) Article Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve the accuracy of speech emotion recognition, including speech signal feature extraction and emotion classification methods. Five types of features are extracted from a speech sample: mel frequency cepstrum coefficient (MFCC), pitch, formant, short-term zero-crossing rate and short-term energy. By comparing statistical features with deep features extracted by a Deep Belief Network (DBN), we attempt to find the best features to identify the emotion status for speech. We propose a novel classification method that combines DBN and SVM (support vector machine) instead of using only one of them. In addition, a conjugate gradient method is applied to train DBN in order to speed up the training process. Gender-dependent experiments are conducted using an emotional speech database created by the Chinese Academy of Sciences. The results show that DBN features can reflect emotion status better than artificial features, and our new classification approach achieves an accuracy of 95.8%, which is higher than using either DBN or SVM separately. Results also show that DBN can work very well for small training databases if it is properly designed. MDPI 2017-07-24 /pmc/articles/PMC5539696/ /pubmed/28737705 http://dx.doi.org/10.3390/s17071694 Text en © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhu, Lianzhang
Chen, Leiming
Zhao, Dehai
Zhou, Jiehan
Zhang, Weishan
Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN
title Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN
title_full Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN
title_fullStr Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN
title_full_unstemmed Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN
title_short Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN
title_sort emotion recognition from chinese speech for smart affective services using a combination of svm and dbn
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5539696/
https://www.ncbi.nlm.nih.gov/pubmed/28737705
http://dx.doi.org/10.3390/s17071694
work_keys_str_mv AT zhulianzhang emotionrecognitionfromchinesespeechforsmartaffectiveservicesusingacombinationofsvmanddbn
AT chenleiming emotionrecognitionfromchinesespeechforsmartaffectiveservicesusingacombinationofsvmanddbn
AT zhaodehai emotionrecognitionfromchinesespeechforsmartaffectiveservicesusingacombinationofsvmanddbn
AT zhoujiehan emotionrecognitionfromchinesespeechforsmartaffectiveservicesusingacombinationofsvmanddbn
AT zhangweishan emotionrecognitionfromchinesespeechforsmartaffectiveservicesusingacombinationofsvmanddbn