Cargando…

Bidirectional parallel echo state network for speech emotion recognition

Speech is an effective way for communicating and exchanging complex information between humans. Speech signal has involved a great attention in human-computer interaction. Therefore, emotion recognition from speech has become a hot research topic in the field of interacting machines with humans. In...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ibrahim, Hemin, Loo, Chu Kiong, Alnajjar, Fady
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer London 2022
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9152839/ https://www.ncbi.nlm.nih.gov/pubmed/35669535 http://dx.doi.org/10.1007/s00521-022-07410-2

_version_	1784717722469269504
author	Ibrahim, Hemin Loo, Chu Kiong Alnajjar, Fady
author_facet	Ibrahim, Hemin Loo, Chu Kiong Alnajjar, Fady
author_sort	Ibrahim, Hemin
collection	PubMed
description	Speech is an effective way for communicating and exchanging complex information between humans. Speech signal has involved a great attention in human-computer interaction. Therefore, emotion recognition from speech has become a hot research topic in the field of interacting machines with humans. In this paper, we proposed a novel speech emotion recognition system by adopting multivariate time series handcrafted feature representation from speech signals. Bidirectional echo state network with two parallel reservoir layers has been applied to capture additional independent information. The parallel reservoirs produce multiple representations for each direction from the bidirectional data with two stages of concatenation. The sparse random projection approach has been adopted to reduce the high-dimensional sparse output for each direction separately from both reservoirs. Random over-sampling and random under-sampling methods are used to overcome the imbalanced nature of the used speech emotion datasets. The performance of the proposed parallel ESN model is evaluated from the speaker-independent experiments on EMO-DB, SAVEE, RAVDESS, and FAU Aibo datasets. The results show that the proposed SER model is superior to the single reservoir and the state-of-the-art studies.
format	Online Article Text
id	pubmed-9152839
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Springer London
record_format	MEDLINE/PubMed
spelling	pubmed-91528392022-06-02 Bidirectional parallel echo state network for speech emotion recognition Ibrahim, Hemin Loo, Chu Kiong Alnajjar, Fady Neural Comput Appl Original Article Speech is an effective way for communicating and exchanging complex information between humans. Speech signal has involved a great attention in human-computer interaction. Therefore, emotion recognition from speech has become a hot research topic in the field of interacting machines with humans. In this paper, we proposed a novel speech emotion recognition system by adopting multivariate time series handcrafted feature representation from speech signals. Bidirectional echo state network with two parallel reservoir layers has been applied to capture additional independent information. The parallel reservoirs produce multiple representations for each direction from the bidirectional data with two stages of concatenation. The sparse random projection approach has been adopted to reduce the high-dimensional sparse output for each direction separately from both reservoirs. Random over-sampling and random under-sampling methods are used to overcome the imbalanced nature of the used speech emotion datasets. The performance of the proposed parallel ESN model is evaluated from the speaker-independent experiments on EMO-DB, SAVEE, RAVDESS, and FAU Aibo datasets. The results show that the proposed SER model is superior to the single reservoir and the state-of-the-art studies. Springer London 2022-05-31 2022 /pmc/articles/PMC9152839/ /pubmed/35669535 http://dx.doi.org/10.1007/s00521-022-07410-2 Text en © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Original Article Ibrahim, Hemin Loo, Chu Kiong Alnajjar, Fady Bidirectional parallel echo state network for speech emotion recognition
title	Bidirectional parallel echo state network for speech emotion recognition
title_full	Bidirectional parallel echo state network for speech emotion recognition
title_fullStr	Bidirectional parallel echo state network for speech emotion recognition
title_full_unstemmed	Bidirectional parallel echo state network for speech emotion recognition
title_short	Bidirectional parallel echo state network for speech emotion recognition
title_sort	bidirectional parallel echo state network for speech emotion recognition
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9152839/ https://www.ncbi.nlm.nih.gov/pubmed/35669535 http://dx.doi.org/10.1007/s00521-022-07410-2
work_keys_str_mv	AT ibrahimhemin bidirectionalparallelechostatenetworkforspeechemotionrecognition AT loochukiong bidirectionalparallelechostatenetworkforspeechemotionrecognition AT alnajjarfady bidirectionalparallelechostatenetworkforspeechemotionrecognition

Bidirectional parallel echo state network for speech emotion recognition

Ejemplares similares