Cargando…

KBES: A dataset for realistic Bangla speech emotion recognition with intensity level

Speech Emotion Recognition (SER) identifies and categorizes emotional states by analyzing speech signals. SER is an emerging research area using machine learning and deep learning techniques due to its socio-cultural and business importance. An appropriate dataset is an important resource for SER re...

Descripción completa

Detalles Bibliográficos
Autores principales: Billah, Md. Masum, Sarker, Md. Likhon, Akhand, M. A. H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10641593/
https://www.ncbi.nlm.nih.gov/pubmed/37965597
http://dx.doi.org/10.1016/j.dib.2023.109741
_version_ 1785146802317557760
author Billah, Md. Masum
Sarker, Md. Likhon
Akhand, M. A. H.
author_facet Billah, Md. Masum
Sarker, Md. Likhon
Akhand, M. A. H.
author_sort Billah, Md. Masum
collection PubMed
description Speech Emotion Recognition (SER) identifies and categorizes emotional states by analyzing speech signals. SER is an emerging research area using machine learning and deep learning techniques due to its socio-cultural and business importance. An appropriate dataset is an important resource for SER related studies in a particular language. There is an apparent lack of SER datasets in Bangla language although it is one of the most spoken languages in the world. There are a few Bangla SER datasets but those consist of only a few dialogs with a minimal number of actors making them unsuitable for real-world applications. Moreover, the existing datasets do not consider the intensity level of emotions. The intensity of a specific emotional expression, such as anger or sadness, plays a crucial role in social behavior. Therefore, a realistic Bangla speech dataset is developed in this study which is called KUET Bangla Emotional Speech (KBES) dataset. The dataset consists of 900 audio signals (i.e., speech dialogs) from 35 actors (20 females and 15 males) with diverse age ranges. Source of the speech dialogs are Bangla Telefilm, Drama, TV Series, Web Series. There are five emotional categories: Neutral, Happy, Sad, Angry, and Disgust. Except Neutral, samples of a particular emotion are divided into two intensity levels: Low and High. The significant issue of the dataset is that the speech dialogs are almost unique with relatively large number of actors; whereas, existing datasets (such as SUBESCO and BanglaSER) contain samples with repeatedly spoken of a few pre-defined dialogs by a few actors/research volunteers in the laboratory environment. Finally, the KBES dataset is exposed as a nine-class problem to classify emotions into nine categories: Neutral, Happy (Low), Happy (High), Sad (Low), Sad (High), Angry (Low), Angry (High), Disgust (Low) and Disgust (High). However, the dataset is kept symmetrical containing 100 samples for each of the nine classes; 100 samples are also gender balanced with 50 samples for male/female actors. The developed dataset seems a realistic dataset while compared with the existing SER datasets.
format Online
Article
Text
id pubmed-10641593
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-106415932023-11-14 KBES: A dataset for realistic Bangla speech emotion recognition with intensity level Billah, Md. Masum Sarker, Md. Likhon Akhand, M. A. H. Data Brief Data Article Speech Emotion Recognition (SER) identifies and categorizes emotional states by analyzing speech signals. SER is an emerging research area using machine learning and deep learning techniques due to its socio-cultural and business importance. An appropriate dataset is an important resource for SER related studies in a particular language. There is an apparent lack of SER datasets in Bangla language although it is one of the most spoken languages in the world. There are a few Bangla SER datasets but those consist of only a few dialogs with a minimal number of actors making them unsuitable for real-world applications. Moreover, the existing datasets do not consider the intensity level of emotions. The intensity of a specific emotional expression, such as anger or sadness, plays a crucial role in social behavior. Therefore, a realistic Bangla speech dataset is developed in this study which is called KUET Bangla Emotional Speech (KBES) dataset. The dataset consists of 900 audio signals (i.e., speech dialogs) from 35 actors (20 females and 15 males) with diverse age ranges. Source of the speech dialogs are Bangla Telefilm, Drama, TV Series, Web Series. There are five emotional categories: Neutral, Happy, Sad, Angry, and Disgust. Except Neutral, samples of a particular emotion are divided into two intensity levels: Low and High. The significant issue of the dataset is that the speech dialogs are almost unique with relatively large number of actors; whereas, existing datasets (such as SUBESCO and BanglaSER) contain samples with repeatedly spoken of a few pre-defined dialogs by a few actors/research volunteers in the laboratory environment. Finally, the KBES dataset is exposed as a nine-class problem to classify emotions into nine categories: Neutral, Happy (Low), Happy (High), Sad (Low), Sad (High), Angry (Low), Angry (High), Disgust (Low) and Disgust (High). However, the dataset is kept symmetrical containing 100 samples for each of the nine classes; 100 samples are also gender balanced with 50 samples for male/female actors. The developed dataset seems a realistic dataset while compared with the existing SER datasets. Elsevier 2023-10-31 /pmc/articles/PMC10641593/ /pubmed/37965597 http://dx.doi.org/10.1016/j.dib.2023.109741 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Data Article
Billah, Md. Masum
Sarker, Md. Likhon
Akhand, M. A. H.
KBES: A dataset for realistic Bangla speech emotion recognition with intensity level
title KBES: A dataset for realistic Bangla speech emotion recognition with intensity level
title_full KBES: A dataset for realistic Bangla speech emotion recognition with intensity level
title_fullStr KBES: A dataset for realistic Bangla speech emotion recognition with intensity level
title_full_unstemmed KBES: A dataset for realistic Bangla speech emotion recognition with intensity level
title_short KBES: A dataset for realistic Bangla speech emotion recognition with intensity level
title_sort kbes: a dataset for realistic bangla speech emotion recognition with intensity level
topic Data Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10641593/
https://www.ncbi.nlm.nih.gov/pubmed/37965597
http://dx.doi.org/10.1016/j.dib.2023.109741
work_keys_str_mv AT billahmdmasum kbesadatasetforrealisticbanglaspeechemotionrecognitionwithintensitylevel
AT sarkermdlikhon kbesadatasetforrealisticbanglaspeechemotionrecognitionwithintensitylevel
AT akhandmah kbesadatasetforrealisticbanglaspeechemotionrecognitionwithintensitylevel