Cargando…

Using Hybrid HMM/DNN Embedding Extractor Models in Computational Paralinguistic Tasks

The field of computational paralinguistics emerged from automatic speech processing, and it covers a wide range of tasks involving different phenomena present in human speech. It focuses on the non-verbal content of human speech, including tasks such as spoken emotion recognition, conflict intensity...

Descripción completa

Detalles Bibliográficos
Autores principales: Vetráb, Mercedes, Gosztolya, Gábor
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10256007/
https://www.ncbi.nlm.nih.gov/pubmed/37299935
http://dx.doi.org/10.3390/s23115208
_version_ 1785057009952882688
author Vetráb, Mercedes
Gosztolya, Gábor
author_facet Vetráb, Mercedes
Gosztolya, Gábor
author_sort Vetráb, Mercedes
collection PubMed
description The field of computational paralinguistics emerged from automatic speech processing, and it covers a wide range of tasks involving different phenomena present in human speech. It focuses on the non-verbal content of human speech, including tasks such as spoken emotion recognition, conflict intensity estimation and sleepiness detection from speech, showing straightforward application possibilities for remote monitoring with acoustic sensors. The two main technical issues present in computational paralinguistics are (1) handling varying-length utterances with traditional classifiers and (2) training models on relatively small corpora. In this study, we present a method that combines automatic speech recognition and paralinguistic approaches, which is able to handle both of these technical issues. That is, we trained a HMM/DNN hybrid acoustic model on a general ASR corpus, which was then used as a source of embeddings employed as features for several paralinguistic tasks. To convert the local embeddings into utterance-level features, we experimented with five different aggregation methods, namely mean, standard deviation, skewness, kurtosis and the ratio of non-zero activations. Our results show that the proposed feature extraction technique consistently outperforms the widely used x-vector method used as the baseline, independently of the actual paralinguistic task investigated. Furthermore, the aggregation techniques could be combined effectively as well, leading to further improvements depending on the task and the layer of the neural network serving as the source of the local embeddings. Overall, based on our experimental results, the proposed method can be considered as a competitive and resource-efficient approach for a wide range of computational paralinguistic tasks.
format Online
Article
Text
id pubmed-10256007
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-102560072023-06-10 Using Hybrid HMM/DNN Embedding Extractor Models in Computational Paralinguistic Tasks Vetráb, Mercedes Gosztolya, Gábor Sensors (Basel) Article The field of computational paralinguistics emerged from automatic speech processing, and it covers a wide range of tasks involving different phenomena present in human speech. It focuses on the non-verbal content of human speech, including tasks such as spoken emotion recognition, conflict intensity estimation and sleepiness detection from speech, showing straightforward application possibilities for remote monitoring with acoustic sensors. The two main technical issues present in computational paralinguistics are (1) handling varying-length utterances with traditional classifiers and (2) training models on relatively small corpora. In this study, we present a method that combines automatic speech recognition and paralinguistic approaches, which is able to handle both of these technical issues. That is, we trained a HMM/DNN hybrid acoustic model on a general ASR corpus, which was then used as a source of embeddings employed as features for several paralinguistic tasks. To convert the local embeddings into utterance-level features, we experimented with five different aggregation methods, namely mean, standard deviation, skewness, kurtosis and the ratio of non-zero activations. Our results show that the proposed feature extraction technique consistently outperforms the widely used x-vector method used as the baseline, independently of the actual paralinguistic task investigated. Furthermore, the aggregation techniques could be combined effectively as well, leading to further improvements depending on the task and the layer of the neural network serving as the source of the local embeddings. Overall, based on our experimental results, the proposed method can be considered as a competitive and resource-efficient approach for a wide range of computational paralinguistic tasks. MDPI 2023-05-30 /pmc/articles/PMC10256007/ /pubmed/37299935 http://dx.doi.org/10.3390/s23115208 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Vetráb, Mercedes
Gosztolya, Gábor
Using Hybrid HMM/DNN Embedding Extractor Models in Computational Paralinguistic Tasks
title Using Hybrid HMM/DNN Embedding Extractor Models in Computational Paralinguistic Tasks
title_full Using Hybrid HMM/DNN Embedding Extractor Models in Computational Paralinguistic Tasks
title_fullStr Using Hybrid HMM/DNN Embedding Extractor Models in Computational Paralinguistic Tasks
title_full_unstemmed Using Hybrid HMM/DNN Embedding Extractor Models in Computational Paralinguistic Tasks
title_short Using Hybrid HMM/DNN Embedding Extractor Models in Computational Paralinguistic Tasks
title_sort using hybrid hmm/dnn embedding extractor models in computational paralinguistic tasks
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10256007/
https://www.ncbi.nlm.nih.gov/pubmed/37299935
http://dx.doi.org/10.3390/s23115208
work_keys_str_mv AT vetrabmercedes usinghybridhmmdnnembeddingextractormodelsincomputationalparalinguistictasks
AT gosztolyagabor usinghybridhmmdnnembeddingextractormodelsincomputationalparalinguistictasks