Cargando…

On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition

Many speech emotion recognition systems have been designed using different features and classification methods. Still, there is a lack of knowledge and reasoning regarding the underlying speech characteristics and processing, i.e., how basic characteristics, methods, and settings affect the accuracy...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kacur, Juraj, Puterka, Boris, Pavlovicova, Jarmila, Oravec, Milos
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7962835/ https://www.ncbi.nlm.nih.gov/pubmed/33800348 http://dx.doi.org/10.3390/s21051888

_version_	1783665530881703936
author	Kacur, Juraj Puterka, Boris Pavlovicova, Jarmila Oravec, Milos
author_facet	Kacur, Juraj Puterka, Boris Pavlovicova, Jarmila Oravec, Milos
author_sort	Kacur, Juraj
collection	PubMed
description	Many speech emotion recognition systems have been designed using different features and classification methods. Still, there is a lack of knowledge and reasoning regarding the underlying speech characteristics and processing, i.e., how basic characteristics, methods, and settings affect the accuracy, to what extent, etc. This study is to extend physical perspective on speech emotion recognition by analyzing basic speech characteristics and modeling methods, e.g., time characteristics (segmentation, window types, and classification regions—lengths and overlaps), frequency ranges, frequency scales, processing of whole speech (spectrograms), vocal tract (filter banks, linear prediction coefficient (LPC) modeling), and excitation (inverse LPC filtering) signals, magnitude and phase manipulations, cepstral features, etc. In the evaluation phase the state-of-the-art classification method and rigorous statistical tests were applied, namely N-fold cross validation, paired t-test, rank, and Pearson correlations. The results revealed several settings in a 75% accuracy range (seven emotions). The most successful methods were based on vocal tract features using psychoacoustic filter banks covering the 0–8 kHz frequency range. Well scoring are also spectrograms carrying vocal tract and excitation information. It was found that even basic processing like pre-emphasis, segmentation, magnitude modifications, etc., can dramatically affect the results. Most findings are robust by exhibiting strong correlations across tested databases.
format	Online Article Text
id	pubmed-7962835
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-79628352021-03-17 On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition Kacur, Juraj Puterka, Boris Pavlovicova, Jarmila Oravec, Milos Sensors (Basel) Article Many speech emotion recognition systems have been designed using different features and classification methods. Still, there is a lack of knowledge and reasoning regarding the underlying speech characteristics and processing, i.e., how basic characteristics, methods, and settings affect the accuracy, to what extent, etc. This study is to extend physical perspective on speech emotion recognition by analyzing basic speech characteristics and modeling methods, e.g., time characteristics (segmentation, window types, and classification regions—lengths and overlaps), frequency ranges, frequency scales, processing of whole speech (spectrograms), vocal tract (filter banks, linear prediction coefficient (LPC) modeling), and excitation (inverse LPC filtering) signals, magnitude and phase manipulations, cepstral features, etc. In the evaluation phase the state-of-the-art classification method and rigorous statistical tests were applied, namely N-fold cross validation, paired t-test, rank, and Pearson correlations. The results revealed several settings in a 75% accuracy range (seven emotions). The most successful methods were based on vocal tract features using psychoacoustic filter banks covering the 0–8 kHz frequency range. Well scoring are also spectrograms carrying vocal tract and excitation information. It was found that even basic processing like pre-emphasis, segmentation, magnitude modifications, etc., can dramatically affect the results. Most findings are robust by exhibiting strong correlations across tested databases. MDPI 2021-03-08 /pmc/articles/PMC7962835/ /pubmed/33800348 http://dx.doi.org/10.3390/s21051888 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Kacur, Juraj Puterka, Boris Pavlovicova, Jarmila Oravec, Milos On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition
title	On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition
title_full	On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition
title_fullStr	On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition
title_full_unstemmed	On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition
title_short	On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition
title_sort	on the speech properties and feature extraction methods in speech emotion recognition
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7962835/ https://www.ncbi.nlm.nih.gov/pubmed/33800348 http://dx.doi.org/10.3390/s21051888
work_keys_str_mv	AT kacurjuraj onthespeechpropertiesandfeatureextractionmethodsinspeechemotionrecognition AT puterkaboris onthespeechpropertiesandfeatureextractionmethodsinspeechemotionrecognition AT pavlovicovajarmila onthespeechpropertiesandfeatureextractionmethodsinspeechemotionrecognition AT oravecmilos onthespeechpropertiesandfeatureextractionmethodsinspeechemotionrecognition

On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition

Ejemplares similares