Cargando…

Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications

There are many speech and audio processing applications and their number is growing. They may cover a wide range of tasks, each having different requirements on the processed speech or audio signals and, therefore, indirectly, on the audio sensors as well. This article reports on tests and evaluatio...

Descripción completa

Detalles Bibliográficos
Autores principales: Kacur, Juraj, Puterka, Boris, Pavlovicova, Jarmila, Oravec, Milos
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9414251/
https://www.ncbi.nlm.nih.gov/pubmed/36016068
http://dx.doi.org/10.3390/s22166304
_version_ 1784775945197977600
author Kacur, Juraj
Puterka, Boris
Pavlovicova, Jarmila
Oravec, Milos
author_facet Kacur, Juraj
Puterka, Boris
Pavlovicova, Jarmila
Oravec, Milos
author_sort Kacur, Juraj
collection PubMed
description There are many speech and audio processing applications and their number is growing. They may cover a wide range of tasks, each having different requirements on the processed speech or audio signals and, therefore, indirectly, on the audio sensors as well. This article reports on tests and evaluation of the effect of basic physical properties of speech and audio signals on the recognition accuracy of major speech/audio processing applications, i.e., speech recognition, speaker recognition, speech emotion recognition, and audio event recognition. A particular focus is on frequency ranges, time intervals, a precision of representation (quantization), and complexities of models suitable for each class of applications. Using domain-specific datasets, eligible feature extraction methods and complex neural network models, it was possible to test and evaluate the effect of basic speech and audio signal properties on the achieved accuracies for each group of applications. The tests confirmed that the basic parameters do affect the overall performance and, moreover, this effect is domain-dependent. Therefore, accurate knowledge of the extent of these effects can be valuable for system designers when selecting appropriate hardware, sensors, architecture, and software for a particular application, especially in the case of limited resources.
format Online
Article
Text
id pubmed-9414251
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-94142512022-08-27 Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications Kacur, Juraj Puterka, Boris Pavlovicova, Jarmila Oravec, Milos Sensors (Basel) Article There are many speech and audio processing applications and their number is growing. They may cover a wide range of tasks, each having different requirements on the processed speech or audio signals and, therefore, indirectly, on the audio sensors as well. This article reports on tests and evaluation of the effect of basic physical properties of speech and audio signals on the recognition accuracy of major speech/audio processing applications, i.e., speech recognition, speaker recognition, speech emotion recognition, and audio event recognition. A particular focus is on frequency ranges, time intervals, a precision of representation (quantization), and complexities of models suitable for each class of applications. Using domain-specific datasets, eligible feature extraction methods and complex neural network models, it was possible to test and evaluate the effect of basic speech and audio signal properties on the achieved accuracies for each group of applications. The tests confirmed that the basic parameters do affect the overall performance and, moreover, this effect is domain-dependent. Therefore, accurate knowledge of the extent of these effects can be valuable for system designers when selecting appropriate hardware, sensors, architecture, and software for a particular application, especially in the case of limited resources. MDPI 2022-08-22 /pmc/articles/PMC9414251/ /pubmed/36016068 http://dx.doi.org/10.3390/s22166304 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kacur, Juraj
Puterka, Boris
Pavlovicova, Jarmila
Oravec, Milos
Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications
title Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications
title_full Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications
title_fullStr Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications
title_full_unstemmed Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications
title_short Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications
title_sort frequency, time, representation and modeling aspects for major speech and audio processing applications
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9414251/
https://www.ncbi.nlm.nih.gov/pubmed/36016068
http://dx.doi.org/10.3390/s22166304
work_keys_str_mv AT kacurjuraj frequencytimerepresentationandmodelingaspectsformajorspeechandaudioprocessingapplications
AT puterkaboris frequencytimerepresentationandmodelingaspectsformajorspeechandaudioprocessingapplications
AT pavlovicovajarmila frequencytimerepresentationandmodelingaspectsformajorspeechandaudioprocessingapplications
AT oravecmilos frequencytimerepresentationandmodelingaspectsformajorspeechandaudioprocessingapplications