Cargando…

Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning

Recognizing human emotions by machines is a complex task. Deep learning models attempt to automate this process by rendering machines to exhibit learning capabilities. However, identifying human emotions from speech with good performance is still challenging. With the advent of deep learning algorit...

Descripción completa

Detalles Bibliográficos
Autores principales:	Aggarwal, Apeksha, Srivastava, Akshat, Agarwal, Ajay, Chahal, Nidhi, Singh, Dilbag, Alnuaim, Abeer Ali, Alhadlaq, Aseel, Lee, Heung-No
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8949356/ https://www.ncbi.nlm.nih.gov/pubmed/35336548 http://dx.doi.org/10.3390/s22062378

_version_	1784674876056928256
author	Aggarwal, Apeksha Srivastava, Akshat Agarwal, Ajay Chahal, Nidhi Singh, Dilbag Alnuaim, Abeer Ali Alhadlaq, Aseel Lee, Heung-No
author_facet	Aggarwal, Apeksha Srivastava, Akshat Agarwal, Ajay Chahal, Nidhi Singh, Dilbag Alnuaim, Abeer Ali Alhadlaq, Aseel Lee, Heung-No
author_sort	Aggarwal, Apeksha
collection	PubMed
description	Recognizing human emotions by machines is a complex task. Deep learning models attempt to automate this process by rendering machines to exhibit learning capabilities. However, identifying human emotions from speech with good performance is still challenging. With the advent of deep learning algorithms, this problem has been addressed recently. However, most research work in the past focused on feature extraction as only one method for training. In this research, we have explored two different methods of extracting features to address effective speech emotion recognition. Initially, two-way feature extraction is proposed by utilizing super convergence to extract two sets of potential features from the speech data. For the first set of features, principal component analysis (PCA) is applied to obtain the first feature set. Thereafter, a deep neural network (DNN) with dense and dropout layers is implemented. In the second approach, mel-spectrogram images are extracted from audio files, and the 2D images are given as input to the pre-trained VGG-16 model. Extensive experiments and an in-depth comparative analysis over both the feature extraction methods with multiple algorithms and over two datasets are performed in this work. The RAVDESS dataset provided significantly better accuracy than using numeric features on a DNN.
format	Online Article Text
id	pubmed-8949356
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-89493562022-03-26 Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning Aggarwal, Apeksha Srivastava, Akshat Agarwal, Ajay Chahal, Nidhi Singh, Dilbag Alnuaim, Abeer Ali Alhadlaq, Aseel Lee, Heung-No Sensors (Basel) Article Recognizing human emotions by machines is a complex task. Deep learning models attempt to automate this process by rendering machines to exhibit learning capabilities. However, identifying human emotions from speech with good performance is still challenging. With the advent of deep learning algorithms, this problem has been addressed recently. However, most research work in the past focused on feature extraction as only one method for training. In this research, we have explored two different methods of extracting features to address effective speech emotion recognition. Initially, two-way feature extraction is proposed by utilizing super convergence to extract two sets of potential features from the speech data. For the first set of features, principal component analysis (PCA) is applied to obtain the first feature set. Thereafter, a deep neural network (DNN) with dense and dropout layers is implemented. In the second approach, mel-spectrogram images are extracted from audio files, and the 2D images are given as input to the pre-trained VGG-16 model. Extensive experiments and an in-depth comparative analysis over both the feature extraction methods with multiple algorithms and over two datasets are performed in this work. The RAVDESS dataset provided significantly better accuracy than using numeric features on a DNN. MDPI 2022-03-19 /pmc/articles/PMC8949356/ /pubmed/35336548 http://dx.doi.org/10.3390/s22062378 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Aggarwal, Apeksha Srivastava, Akshat Agarwal, Ajay Chahal, Nidhi Singh, Dilbag Alnuaim, Abeer Ali Alhadlaq, Aseel Lee, Heung-No Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
title	Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
title_full	Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
title_fullStr	Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
title_full_unstemmed	Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
title_short	Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
title_sort	two-way feature extraction for speech emotion recognition using deep learning
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8949356/ https://www.ncbi.nlm.nih.gov/pubmed/35336548 http://dx.doi.org/10.3390/s22062378
work_keys_str_mv	AT aggarwalapeksha twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning AT srivastavaakshat twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning AT agarwalajay twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning AT chahalnidhi twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning AT singhdilbag twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning AT alnuaimabeerali twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning AT alhadlaqaseel twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning AT leeheungno twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning

Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning

Ejemplares similares