Cargando…

BanglaSER: A speech emotion recognition dataset for the Bangla language

The speech emotion recognition system determines a speaker's emotional state by analyzing his/her speech audio signal. It is an essential at the same time a challenging task in human-computer interaction systems and is one of the most demanding areas of research using artificial intelligence an...

Descripción completa

Detalles Bibliográficos
Autores principales: Das, Rakesh Kumar, Islam, Nahidul, Ahmed, Md. Rayhan, Islam, Salekul, Shatabda, Swakkhar, Islam, A.K.M. Muzahidul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8980634/
https://www.ncbi.nlm.nih.gov/pubmed/35392615
http://dx.doi.org/10.1016/j.dib.2022.108091
_version_ 1784681433682411520
author Das, Rakesh Kumar
Islam, Nahidul
Ahmed, Md. Rayhan
Islam, Salekul
Shatabda, Swakkhar
Islam, A.K.M. Muzahidul
author_facet Das, Rakesh Kumar
Islam, Nahidul
Ahmed, Md. Rayhan
Islam, Salekul
Shatabda, Swakkhar
Islam, A.K.M. Muzahidul
author_sort Das, Rakesh Kumar
collection PubMed
description The speech emotion recognition system determines a speaker's emotional state by analyzing his/her speech audio signal. It is an essential at the same time a challenging task in human-computer interaction systems and is one of the most demanding areas of research using artificial intelligence and deep machine learning architectures. Despite being the world's seventh most widely spoken language, Bangla is still classified as one of the low-resource languages for speech emotion recognition tasks because of inadequate availability of data. There is an apparent lack of speech emotion recognition dataset to perform this type of research in Bangla language. This article presents a Bangla language-based emotional speech-audio recognition dataset to address this problem. BanglaSER is a Bangla language-based speech emotion recognition dataset. It consists of speech-audio data of 34 participating speakers from diverse age groups between 19 and 47 years, with a balanced 17 male and 17 female nonprofessional participating actors. This dataset contains 1467 Bangla speech-audio recordings of five rudimentary human emotional states, namely angry, happy, neutral, sad, and surprise. Three trials are conducted for each emotional state. Hence, the total number of recordings involves 3 statements × 3 repetitions × 4 emotional states (angry, happy, sad, and surprise) × 34 participating speakers = 1224 recordings + 3 statements × 3 repetitions × 1 emotional state (neutral) × 27 participating speakers = 243 recordings, resulting in a total number of recordings of 1467. BanglaSER dataset is created by recording speech-audios through smartphones, and laptops, having a balanced number of recordings in each category with evenly distributed participating male and female actors, and would serve as an essential training dataset for the Bangla speech emotion recognition model in terms of generalization. BanglaSER is compatible with various deep learning architectures such as Convolutional neural networks, Long short-term memory, Gated recurrent unit, Transformer, etc. The dataset is available at https://data.mendeley.com/datasets/t9h6p943xy/5 and can be used for research purposes.
format Online
Article
Text
id pubmed-8980634
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-89806342022-04-06 BanglaSER: A speech emotion recognition dataset for the Bangla language Das, Rakesh Kumar Islam, Nahidul Ahmed, Md. Rayhan Islam, Salekul Shatabda, Swakkhar Islam, A.K.M. Muzahidul Data Brief Data Article The speech emotion recognition system determines a speaker's emotional state by analyzing his/her speech audio signal. It is an essential at the same time a challenging task in human-computer interaction systems and is one of the most demanding areas of research using artificial intelligence and deep machine learning architectures. Despite being the world's seventh most widely spoken language, Bangla is still classified as one of the low-resource languages for speech emotion recognition tasks because of inadequate availability of data. There is an apparent lack of speech emotion recognition dataset to perform this type of research in Bangla language. This article presents a Bangla language-based emotional speech-audio recognition dataset to address this problem. BanglaSER is a Bangla language-based speech emotion recognition dataset. It consists of speech-audio data of 34 participating speakers from diverse age groups between 19 and 47 years, with a balanced 17 male and 17 female nonprofessional participating actors. This dataset contains 1467 Bangla speech-audio recordings of five rudimentary human emotional states, namely angry, happy, neutral, sad, and surprise. Three trials are conducted for each emotional state. Hence, the total number of recordings involves 3 statements × 3 repetitions × 4 emotional states (angry, happy, sad, and surprise) × 34 participating speakers = 1224 recordings + 3 statements × 3 repetitions × 1 emotional state (neutral) × 27 participating speakers = 243 recordings, resulting in a total number of recordings of 1467. BanglaSER dataset is created by recording speech-audios through smartphones, and laptops, having a balanced number of recordings in each category with evenly distributed participating male and female actors, and would serve as an essential training dataset for the Bangla speech emotion recognition model in terms of generalization. BanglaSER is compatible with various deep learning architectures such as Convolutional neural networks, Long short-term memory, Gated recurrent unit, Transformer, etc. The dataset is available at https://data.mendeley.com/datasets/t9h6p943xy/5 and can be used for research purposes. Elsevier 2022-03-22 /pmc/articles/PMC8980634/ /pubmed/35392615 http://dx.doi.org/10.1016/j.dib.2022.108091 Text en © 2022 The Author(s). Published by Elsevier Inc. https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Data Article
Das, Rakesh Kumar
Islam, Nahidul
Ahmed, Md. Rayhan
Islam, Salekul
Shatabda, Swakkhar
Islam, A.K.M. Muzahidul
BanglaSER: A speech emotion recognition dataset for the Bangla language
title BanglaSER: A speech emotion recognition dataset for the Bangla language
title_full BanglaSER: A speech emotion recognition dataset for the Bangla language
title_fullStr BanglaSER: A speech emotion recognition dataset for the Bangla language
title_full_unstemmed BanglaSER: A speech emotion recognition dataset for the Bangla language
title_short BanglaSER: A speech emotion recognition dataset for the Bangla language
title_sort banglaser: a speech emotion recognition dataset for the bangla language
topic Data Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8980634/
https://www.ncbi.nlm.nih.gov/pubmed/35392615
http://dx.doi.org/10.1016/j.dib.2022.108091
work_keys_str_mv AT dasrakeshkumar banglaseraspeechemotionrecognitiondatasetforthebanglalanguage
AT islamnahidul banglaseraspeechemotionrecognitiondatasetforthebanglalanguage
AT ahmedmdrayhan banglaseraspeechemotionrecognitiondatasetforthebanglalanguage
AT islamsalekul banglaseraspeechemotionrecognitiondatasetforthebanglalanguage
AT shatabdaswakkhar banglaseraspeechemotionrecognitiondatasetforthebanglalanguage
AT islamakmmuzahidul banglaseraspeechemotionrecognitiondatasetforthebanglalanguage