Cargando…
Effects of Data Augmentations on Speech Emotion Recognition
Data augmentation techniques have recently gained more adoption in speech processing, including speech emotion recognition. Although more data tend to be more effective, there may be a trade-off in which more data will not provide a better model. This paper reports experiments on investigating the e...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9415521/ https://www.ncbi.nlm.nih.gov/pubmed/36015717 http://dx.doi.org/10.3390/s22165941 |
_version_ | 1784776252658286592 |
---|---|
author | Atmaja, Bagus Tris Sasou, Akira |
author_facet | Atmaja, Bagus Tris Sasou, Akira |
author_sort | Atmaja, Bagus Tris |
collection | PubMed |
description | Data augmentation techniques have recently gained more adoption in speech processing, including speech emotion recognition. Although more data tend to be more effective, there may be a trade-off in which more data will not provide a better model. This paper reports experiments on investigating the effects of data augmentation in speech emotion recognition. The investigation aims at finding the most useful type of data augmentation and the number of data augmentations for speech emotion recognition in various conditions. The experiments are conducted on the Japanese Twitter-based emotional speech and IEMOCAP datasets. The results show that for speaker-independent data, two data augmentations with glottal source extraction and silence removal exhibited the best performance among others, even with more data augmentation techniques. For the text-independent data (including speaker and text-independent), more data augmentations tend to improve speech emotion recognition performances. The results highlight the trade-off between the number of data augmentations and the performance of speech emotion recognition showing the necessity to choose a proper data augmentation technique for a specific condition. |
format | Online Article Text |
id | pubmed-9415521 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-94155212022-08-27 Effects of Data Augmentations on Speech Emotion Recognition Atmaja, Bagus Tris Sasou, Akira Sensors (Basel) Article Data augmentation techniques have recently gained more adoption in speech processing, including speech emotion recognition. Although more data tend to be more effective, there may be a trade-off in which more data will not provide a better model. This paper reports experiments on investigating the effects of data augmentation in speech emotion recognition. The investigation aims at finding the most useful type of data augmentation and the number of data augmentations for speech emotion recognition in various conditions. The experiments are conducted on the Japanese Twitter-based emotional speech and IEMOCAP datasets. The results show that for speaker-independent data, two data augmentations with glottal source extraction and silence removal exhibited the best performance among others, even with more data augmentation techniques. For the text-independent data (including speaker and text-independent), more data augmentations tend to improve speech emotion recognition performances. The results highlight the trade-off between the number of data augmentations and the performance of speech emotion recognition showing the necessity to choose a proper data augmentation technique for a specific condition. MDPI 2022-08-09 /pmc/articles/PMC9415521/ /pubmed/36015717 http://dx.doi.org/10.3390/s22165941 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Atmaja, Bagus Tris Sasou, Akira Effects of Data Augmentations on Speech Emotion Recognition |
title | Effects of Data Augmentations on Speech Emotion Recognition |
title_full | Effects of Data Augmentations on Speech Emotion Recognition |
title_fullStr | Effects of Data Augmentations on Speech Emotion Recognition |
title_full_unstemmed | Effects of Data Augmentations on Speech Emotion Recognition |
title_short | Effects of Data Augmentations on Speech Emotion Recognition |
title_sort | effects of data augmentations on speech emotion recognition |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9415521/ https://www.ncbi.nlm.nih.gov/pubmed/36015717 http://dx.doi.org/10.3390/s22165941 |
work_keys_str_mv | AT atmajabagustris effectsofdataaugmentationsonspeechemotionrecognition AT sasouakira effectsofdataaugmentationsonspeechemotionrecognition |