Cargando…

Effects of Data Augmentations on Speech Emotion Recognition

Data augmentation techniques have recently gained more adoption in speech processing, including speech emotion recognition. Although more data tend to be more effective, there may be a trade-off in which more data will not provide a better model. This paper reports experiments on investigating the e...

Descripción completa

Detalles Bibliográficos
Autores principales: Atmaja, Bagus Tris, Sasou, Akira
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9415521/
https://www.ncbi.nlm.nih.gov/pubmed/36015717
http://dx.doi.org/10.3390/s22165941
_version_ 1784776252658286592
author Atmaja, Bagus Tris
Sasou, Akira
author_facet Atmaja, Bagus Tris
Sasou, Akira
author_sort Atmaja, Bagus Tris
collection PubMed
description Data augmentation techniques have recently gained more adoption in speech processing, including speech emotion recognition. Although more data tend to be more effective, there may be a trade-off in which more data will not provide a better model. This paper reports experiments on investigating the effects of data augmentation in speech emotion recognition. The investigation aims at finding the most useful type of data augmentation and the number of data augmentations for speech emotion recognition in various conditions. The experiments are conducted on the Japanese Twitter-based emotional speech and IEMOCAP datasets. The results show that for speaker-independent data, two data augmentations with glottal source extraction and silence removal exhibited the best performance among others, even with more data augmentation techniques. For the text-independent data (including speaker and text-independent), more data augmentations tend to improve speech emotion recognition performances. The results highlight the trade-off between the number of data augmentations and the performance of speech emotion recognition showing the necessity to choose a proper data augmentation technique for a specific condition.
format Online
Article
Text
id pubmed-9415521
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-94155212022-08-27 Effects of Data Augmentations on Speech Emotion Recognition Atmaja, Bagus Tris Sasou, Akira Sensors (Basel) Article Data augmentation techniques have recently gained more adoption in speech processing, including speech emotion recognition. Although more data tend to be more effective, there may be a trade-off in which more data will not provide a better model. This paper reports experiments on investigating the effects of data augmentation in speech emotion recognition. The investigation aims at finding the most useful type of data augmentation and the number of data augmentations for speech emotion recognition in various conditions. The experiments are conducted on the Japanese Twitter-based emotional speech and IEMOCAP datasets. The results show that for speaker-independent data, two data augmentations with glottal source extraction and silence removal exhibited the best performance among others, even with more data augmentation techniques. For the text-independent data (including speaker and text-independent), more data augmentations tend to improve speech emotion recognition performances. The results highlight the trade-off between the number of data augmentations and the performance of speech emotion recognition showing the necessity to choose a proper data augmentation technique for a specific condition. MDPI 2022-08-09 /pmc/articles/PMC9415521/ /pubmed/36015717 http://dx.doi.org/10.3390/s22165941 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Atmaja, Bagus Tris
Sasou, Akira
Effects of Data Augmentations on Speech Emotion Recognition
title Effects of Data Augmentations on Speech Emotion Recognition
title_full Effects of Data Augmentations on Speech Emotion Recognition
title_fullStr Effects of Data Augmentations on Speech Emotion Recognition
title_full_unstemmed Effects of Data Augmentations on Speech Emotion Recognition
title_short Effects of Data Augmentations on Speech Emotion Recognition
title_sort effects of data augmentations on speech emotion recognition
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9415521/
https://www.ncbi.nlm.nih.gov/pubmed/36015717
http://dx.doi.org/10.3390/s22165941
work_keys_str_mv AT atmajabagustris effectsofdataaugmentationsonspeechemotionrecognition
AT sasouakira effectsofdataaugmentationsonspeechemotionrecognition