Cargando…

Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning

Recognizing human emotions by machines is a complex task. Deep learning models attempt to automate this process by rendering machines to exhibit learning capabilities. However, identifying human emotions from speech with good performance is still challenging. With the advent of deep learning algorit...

Descripción completa

Detalles Bibliográficos
Autores principales: Aggarwal, Apeksha, Srivastava, Akshat, Agarwal, Ajay, Chahal, Nidhi, Singh, Dilbag, Alnuaim, Abeer Ali, Alhadlaq, Aseel, Lee, Heung-No
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8949356/
https://www.ncbi.nlm.nih.gov/pubmed/35336548
http://dx.doi.org/10.3390/s22062378
_version_ 1784674876056928256
author Aggarwal, Apeksha
Srivastava, Akshat
Agarwal, Ajay
Chahal, Nidhi
Singh, Dilbag
Alnuaim, Abeer Ali
Alhadlaq, Aseel
Lee, Heung-No
author_facet Aggarwal, Apeksha
Srivastava, Akshat
Agarwal, Ajay
Chahal, Nidhi
Singh, Dilbag
Alnuaim, Abeer Ali
Alhadlaq, Aseel
Lee, Heung-No
author_sort Aggarwal, Apeksha
collection PubMed
description Recognizing human emotions by machines is a complex task. Deep learning models attempt to automate this process by rendering machines to exhibit learning capabilities. However, identifying human emotions from speech with good performance is still challenging. With the advent of deep learning algorithms, this problem has been addressed recently. However, most research work in the past focused on feature extraction as only one method for training. In this research, we have explored two different methods of extracting features to address effective speech emotion recognition. Initially, two-way feature extraction is proposed by utilizing super convergence to extract two sets of potential features from the speech data. For the first set of features, principal component analysis (PCA) is applied to obtain the first feature set. Thereafter, a deep neural network (DNN) with dense and dropout layers is implemented. In the second approach, mel-spectrogram images are extracted from audio files, and the 2D images are given as input to the pre-trained VGG-16 model. Extensive experiments and an in-depth comparative analysis over both the feature extraction methods with multiple algorithms and over two datasets are performed in this work. The RAVDESS dataset provided significantly better accuracy than using numeric features on a DNN.
format Online
Article
Text
id pubmed-8949356
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-89493562022-03-26 Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning Aggarwal, Apeksha Srivastava, Akshat Agarwal, Ajay Chahal, Nidhi Singh, Dilbag Alnuaim, Abeer Ali Alhadlaq, Aseel Lee, Heung-No Sensors (Basel) Article Recognizing human emotions by machines is a complex task. Deep learning models attempt to automate this process by rendering machines to exhibit learning capabilities. However, identifying human emotions from speech with good performance is still challenging. With the advent of deep learning algorithms, this problem has been addressed recently. However, most research work in the past focused on feature extraction as only one method for training. In this research, we have explored two different methods of extracting features to address effective speech emotion recognition. Initially, two-way feature extraction is proposed by utilizing super convergence to extract two sets of potential features from the speech data. For the first set of features, principal component analysis (PCA) is applied to obtain the first feature set. Thereafter, a deep neural network (DNN) with dense and dropout layers is implemented. In the second approach, mel-spectrogram images are extracted from audio files, and the 2D images are given as input to the pre-trained VGG-16 model. Extensive experiments and an in-depth comparative analysis over both the feature extraction methods with multiple algorithms and over two datasets are performed in this work. The RAVDESS dataset provided significantly better accuracy than using numeric features on a DNN. MDPI 2022-03-19 /pmc/articles/PMC8949356/ /pubmed/35336548 http://dx.doi.org/10.3390/s22062378 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Aggarwal, Apeksha
Srivastava, Akshat
Agarwal, Ajay
Chahal, Nidhi
Singh, Dilbag
Alnuaim, Abeer Ali
Alhadlaq, Aseel
Lee, Heung-No
Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
title Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
title_full Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
title_fullStr Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
title_full_unstemmed Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
title_short Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
title_sort two-way feature extraction for speech emotion recognition using deep learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8949356/
https://www.ncbi.nlm.nih.gov/pubmed/35336548
http://dx.doi.org/10.3390/s22062378
work_keys_str_mv AT aggarwalapeksha twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning
AT srivastavaakshat twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning
AT agarwalajay twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning
AT chahalnidhi twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning
AT singhdilbag twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning
AT alnuaimabeerali twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning
AT alhadlaqaseel twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning
AT leeheungno twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning