Cargando…

Benchmarking open source and paid services for speech to text: an analysis of quality and input variety

INTRODUCTION: Speech to text (STT) technology has seen increased usage in recent years for automating transcription of spoken language. To choose the most suitable tool for a given task, it is essential to evaluate the performance and quality of both open source and paid STT services. METHODS: In th...

Descripción completa

Detalles Bibliográficos
Autores principales: Ferraro, Antonino, Galli, Antonio, La Gatta, Valerio, Postiglione, Marco
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10548127/
https://www.ncbi.nlm.nih.gov/pubmed/37799510
http://dx.doi.org/10.3389/fdata.2023.1210559
_version_ 1785115210021863424
author Ferraro, Antonino
Galli, Antonio
La Gatta, Valerio
Postiglione, Marco
author_facet Ferraro, Antonino
Galli, Antonio
La Gatta, Valerio
Postiglione, Marco
author_sort Ferraro, Antonino
collection PubMed
description INTRODUCTION: Speech to text (STT) technology has seen increased usage in recent years for automating transcription of spoken language. To choose the most suitable tool for a given task, it is essential to evaluate the performance and quality of both open source and paid STT services. METHODS: In this paper, we conduct a benchmarking study of open source and paid STT services, with a specific focus on assessing their performance concerning the variety of input text. We utilizes ix datasets obtained from diverse sources, including interviews, lectures, and speeches, as input for the STT tools. The evaluation of the instruments employs the Word Error Rate (WER), a standard metric for STT evaluation. RESULTS: Our analysis of the results demonstrates significant variations in the performance of the STT tools based on the input text. Certain tools exhibit superior performance on specific types of audio samples compared to others. Our study provides insights into STT tool performance when handling substantial data volumes, as well as the challenges and opportunities posed by the multimedia nature of the data. DISCUSSION: Although paid services generally demonstrate better accuracy and speed compared to open source alternatives, their performance remains dependent on the input text. The study highlights the need for considering specific requirements and characteristics of the audio samples when selecting an appropriate STT tool.
format Online
Article
Text
id pubmed-10548127
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-105481272023-10-05 Benchmarking open source and paid services for speech to text: an analysis of quality and input variety Ferraro, Antonino Galli, Antonio La Gatta, Valerio Postiglione, Marco Front Big Data Big Data INTRODUCTION: Speech to text (STT) technology has seen increased usage in recent years for automating transcription of spoken language. To choose the most suitable tool for a given task, it is essential to evaluate the performance and quality of both open source and paid STT services. METHODS: In this paper, we conduct a benchmarking study of open source and paid STT services, with a specific focus on assessing their performance concerning the variety of input text. We utilizes ix datasets obtained from diverse sources, including interviews, lectures, and speeches, as input for the STT tools. The evaluation of the instruments employs the Word Error Rate (WER), a standard metric for STT evaluation. RESULTS: Our analysis of the results demonstrates significant variations in the performance of the STT tools based on the input text. Certain tools exhibit superior performance on specific types of audio samples compared to others. Our study provides insights into STT tool performance when handling substantial data volumes, as well as the challenges and opportunities posed by the multimedia nature of the data. DISCUSSION: Although paid services generally demonstrate better accuracy and speed compared to open source alternatives, their performance remains dependent on the input text. The study highlights the need for considering specific requirements and characteristics of the audio samples when selecting an appropriate STT tool. Frontiers Media S.A. 2023-09-20 /pmc/articles/PMC10548127/ /pubmed/37799510 http://dx.doi.org/10.3389/fdata.2023.1210559 Text en Copyright © 2023 Ferraro, Galli, La Gatta and Postiglione. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Big Data
Ferraro, Antonino
Galli, Antonio
La Gatta, Valerio
Postiglione, Marco
Benchmarking open source and paid services for speech to text: an analysis of quality and input variety
title Benchmarking open source and paid services for speech to text: an analysis of quality and input variety
title_full Benchmarking open source and paid services for speech to text: an analysis of quality and input variety
title_fullStr Benchmarking open source and paid services for speech to text: an analysis of quality and input variety
title_full_unstemmed Benchmarking open source and paid services for speech to text: an analysis of quality and input variety
title_short Benchmarking open source and paid services for speech to text: an analysis of quality and input variety
title_sort benchmarking open source and paid services for speech to text: an analysis of quality and input variety
topic Big Data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10548127/
https://www.ncbi.nlm.nih.gov/pubmed/37799510
http://dx.doi.org/10.3389/fdata.2023.1210559
work_keys_str_mv AT ferraroantonino benchmarkingopensourceandpaidservicesforspeechtotextananalysisofqualityandinputvariety
AT galliantonio benchmarkingopensourceandpaidservicesforspeechtotextananalysisofqualityandinputvariety
AT lagattavalerio benchmarkingopensourceandpaidservicesforspeechtotextananalysisofqualityandinputvariety
AT postiglionemarco benchmarkingopensourceandpaidservicesforspeechtotextananalysisofqualityandinputvariety