Cargando…

Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS

The existing research on emotion recognition commonly uses mel spectrogram (MelSpec) and Geneva minimalistic acoustic parameter set (GeMAPS) as acoustic parameters to learn the audio features. MelSpec can represent the time-series variations of each frequency but cannot manage multiple types of audi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Toyoshima, Itsuki, Okada, Yoshifumi, Ishimaru, Momoko, Uchiyama, Ryunosuke, Tada, Mayu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Brief Report
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9920472/ https://www.ncbi.nlm.nih.gov/pubmed/36772782 http://dx.doi.org/10.3390/s23031743

_version_	1784887078565183488
author	Toyoshima, Itsuki Okada, Yoshifumi Ishimaru, Momoko Uchiyama, Ryunosuke Tada, Mayu
author_facet	Toyoshima, Itsuki Okada, Yoshifumi Ishimaru, Momoko Uchiyama, Ryunosuke Tada, Mayu
author_sort	Toyoshima, Itsuki
collection	PubMed
description	The existing research on emotion recognition commonly uses mel spectrogram (MelSpec) and Geneva minimalistic acoustic parameter set (GeMAPS) as acoustic parameters to learn the audio features. MelSpec can represent the time-series variations of each frequency but cannot manage multiple types of audio features. On the other hand, GeMAPS can handle multiple audio features but fails to provide information on their time-series variations. Thus, this study proposes a speech emotion recognition model based on a multi-input deep neural network that simultaneously learns these two audio features. The proposed model comprises three parts, specifically, for learning MelSpec in image format, learning GeMAPS in vector format, and integrating them to predict the emotion. Additionally, a focal loss function is introduced to address the imbalanced data problem among the emotion classes. The results of the recognition experiments demonstrate weighted and unweighted accuracies of 0.6657 and 0.6149, respectively, which are higher than or comparable to those of the existing state-of-the-art methods. Overall, the proposed model significantly improves the recognition accuracy of the emotion “happiness”, which has been difficult to identify in previous studies owing to limited data. Therefore, the proposed model can effectively recognize emotions from speech and can be applied for practical purposes with future development.
format	Online Article Text
id	pubmed-9920472
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-99204722023-02-12 Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS Toyoshima, Itsuki Okada, Yoshifumi Ishimaru, Momoko Uchiyama, Ryunosuke Tada, Mayu Sensors (Basel) Brief Report The existing research on emotion recognition commonly uses mel spectrogram (MelSpec) and Geneva minimalistic acoustic parameter set (GeMAPS) as acoustic parameters to learn the audio features. MelSpec can represent the time-series variations of each frequency but cannot manage multiple types of audio features. On the other hand, GeMAPS can handle multiple audio features but fails to provide information on their time-series variations. Thus, this study proposes a speech emotion recognition model based on a multi-input deep neural network that simultaneously learns these two audio features. The proposed model comprises three parts, specifically, for learning MelSpec in image format, learning GeMAPS in vector format, and integrating them to predict the emotion. Additionally, a focal loss function is introduced to address the imbalanced data problem among the emotion classes. The results of the recognition experiments demonstrate weighted and unweighted accuracies of 0.6657 and 0.6149, respectively, which are higher than or comparable to those of the existing state-of-the-art methods. Overall, the proposed model significantly improves the recognition accuracy of the emotion “happiness”, which has been difficult to identify in previous studies owing to limited data. Therefore, the proposed model can effectively recognize emotions from speech and can be applied for practical purposes with future development. MDPI 2023-02-03 /pmc/articles/PMC9920472/ /pubmed/36772782 http://dx.doi.org/10.3390/s23031743 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Brief Report Toyoshima, Itsuki Okada, Yoshifumi Ishimaru, Momoko Uchiyama, Ryunosuke Tada, Mayu Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS
title	Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS
title_full	Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS
title_fullStr	Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS
title_full_unstemmed	Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS
title_short	Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS
title_sort	multi-input speech emotion recognition model using mel spectrogram and gemaps
topic	Brief Report
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9920472/ https://www.ncbi.nlm.nih.gov/pubmed/36772782 http://dx.doi.org/10.3390/s23031743
work_keys_str_mv	AT toyoshimaitsuki multiinputspeechemotionrecognitionmodelusingmelspectrogramandgemaps AT okadayoshifumi multiinputspeechemotionrecognitionmodelusingmelspectrogramandgemaps AT ishimarumomoko multiinputspeechemotionrecognitionmodelusingmelspectrogramandgemaps AT uchiyamaryunosuke multiinputspeechemotionrecognitionmodelusingmelspectrogramandgemaps AT tadamayu multiinputspeechemotionrecognitionmodelusingmelspectrogramandgemaps

Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS

Ejemplares similares