Cargando…
Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications
There are many speech and audio processing applications and their number is growing. They may cover a wide range of tasks, each having different requirements on the processed speech or audio signals and, therefore, indirectly, on the audio sensors as well. This article reports on tests and evaluatio...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9414251/ https://www.ncbi.nlm.nih.gov/pubmed/36016068 http://dx.doi.org/10.3390/s22166304 |
_version_ | 1784775945197977600 |
---|---|
author | Kacur, Juraj Puterka, Boris Pavlovicova, Jarmila Oravec, Milos |
author_facet | Kacur, Juraj Puterka, Boris Pavlovicova, Jarmila Oravec, Milos |
author_sort | Kacur, Juraj |
collection | PubMed |
description | There are many speech and audio processing applications and their number is growing. They may cover a wide range of tasks, each having different requirements on the processed speech or audio signals and, therefore, indirectly, on the audio sensors as well. This article reports on tests and evaluation of the effect of basic physical properties of speech and audio signals on the recognition accuracy of major speech/audio processing applications, i.e., speech recognition, speaker recognition, speech emotion recognition, and audio event recognition. A particular focus is on frequency ranges, time intervals, a precision of representation (quantization), and complexities of models suitable for each class of applications. Using domain-specific datasets, eligible feature extraction methods and complex neural network models, it was possible to test and evaluate the effect of basic speech and audio signal properties on the achieved accuracies for each group of applications. The tests confirmed that the basic parameters do affect the overall performance and, moreover, this effect is domain-dependent. Therefore, accurate knowledge of the extent of these effects can be valuable for system designers when selecting appropriate hardware, sensors, architecture, and software for a particular application, especially in the case of limited resources. |
format | Online Article Text |
id | pubmed-9414251 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-94142512022-08-27 Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications Kacur, Juraj Puterka, Boris Pavlovicova, Jarmila Oravec, Milos Sensors (Basel) Article There are many speech and audio processing applications and their number is growing. They may cover a wide range of tasks, each having different requirements on the processed speech or audio signals and, therefore, indirectly, on the audio sensors as well. This article reports on tests and evaluation of the effect of basic physical properties of speech and audio signals on the recognition accuracy of major speech/audio processing applications, i.e., speech recognition, speaker recognition, speech emotion recognition, and audio event recognition. A particular focus is on frequency ranges, time intervals, a precision of representation (quantization), and complexities of models suitable for each class of applications. Using domain-specific datasets, eligible feature extraction methods and complex neural network models, it was possible to test and evaluate the effect of basic speech and audio signal properties on the achieved accuracies for each group of applications. The tests confirmed that the basic parameters do affect the overall performance and, moreover, this effect is domain-dependent. Therefore, accurate knowledge of the extent of these effects can be valuable for system designers when selecting appropriate hardware, sensors, architecture, and software for a particular application, especially in the case of limited resources. MDPI 2022-08-22 /pmc/articles/PMC9414251/ /pubmed/36016068 http://dx.doi.org/10.3390/s22166304 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Kacur, Juraj Puterka, Boris Pavlovicova, Jarmila Oravec, Milos Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications |
title | Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications |
title_full | Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications |
title_fullStr | Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications |
title_full_unstemmed | Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications |
title_short | Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications |
title_sort | frequency, time, representation and modeling aspects for major speech and audio processing applications |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9414251/ https://www.ncbi.nlm.nih.gov/pubmed/36016068 http://dx.doi.org/10.3390/s22166304 |
work_keys_str_mv | AT kacurjuraj frequencytimerepresentationandmodelingaspectsformajorspeechandaudioprocessingapplications AT puterkaboris frequencytimerepresentationandmodelingaspectsformajorspeechandaudioprocessingapplications AT pavlovicovajarmila frequencytimerepresentationandmodelingaspectsformajorspeechandaudioprocessingapplications AT oravecmilos frequencytimerepresentationandmodelingaspectsformajorspeechandaudioprocessingapplications |