Cargando…
Benchmarking open source and paid services for speech to text: an analysis of quality and input variety
INTRODUCTION: Speech to text (STT) technology has seen increased usage in recent years for automating transcription of spoken language. To choose the most suitable tool for a given task, it is essential to evaluate the performance and quality of both open source and paid STT services. METHODS: In th...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10548127/ https://www.ncbi.nlm.nih.gov/pubmed/37799510 http://dx.doi.org/10.3389/fdata.2023.1210559 |
_version_ | 1785115210021863424 |
---|---|
author | Ferraro, Antonino Galli, Antonio La Gatta, Valerio Postiglione, Marco |
author_facet | Ferraro, Antonino Galli, Antonio La Gatta, Valerio Postiglione, Marco |
author_sort | Ferraro, Antonino |
collection | PubMed |
description | INTRODUCTION: Speech to text (STT) technology has seen increased usage in recent years for automating transcription of spoken language. To choose the most suitable tool for a given task, it is essential to evaluate the performance and quality of both open source and paid STT services. METHODS: In this paper, we conduct a benchmarking study of open source and paid STT services, with a specific focus on assessing their performance concerning the variety of input text. We utilizes ix datasets obtained from diverse sources, including interviews, lectures, and speeches, as input for the STT tools. The evaluation of the instruments employs the Word Error Rate (WER), a standard metric for STT evaluation. RESULTS: Our analysis of the results demonstrates significant variations in the performance of the STT tools based on the input text. Certain tools exhibit superior performance on specific types of audio samples compared to others. Our study provides insights into STT tool performance when handling substantial data volumes, as well as the challenges and opportunities posed by the multimedia nature of the data. DISCUSSION: Although paid services generally demonstrate better accuracy and speed compared to open source alternatives, their performance remains dependent on the input text. The study highlights the need for considering specific requirements and characteristics of the audio samples when selecting an appropriate STT tool. |
format | Online Article Text |
id | pubmed-10548127 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-105481272023-10-05 Benchmarking open source and paid services for speech to text: an analysis of quality and input variety Ferraro, Antonino Galli, Antonio La Gatta, Valerio Postiglione, Marco Front Big Data Big Data INTRODUCTION: Speech to text (STT) technology has seen increased usage in recent years for automating transcription of spoken language. To choose the most suitable tool for a given task, it is essential to evaluate the performance and quality of both open source and paid STT services. METHODS: In this paper, we conduct a benchmarking study of open source and paid STT services, with a specific focus on assessing their performance concerning the variety of input text. We utilizes ix datasets obtained from diverse sources, including interviews, lectures, and speeches, as input for the STT tools. The evaluation of the instruments employs the Word Error Rate (WER), a standard metric for STT evaluation. RESULTS: Our analysis of the results demonstrates significant variations in the performance of the STT tools based on the input text. Certain tools exhibit superior performance on specific types of audio samples compared to others. Our study provides insights into STT tool performance when handling substantial data volumes, as well as the challenges and opportunities posed by the multimedia nature of the data. DISCUSSION: Although paid services generally demonstrate better accuracy and speed compared to open source alternatives, their performance remains dependent on the input text. The study highlights the need for considering specific requirements and characteristics of the audio samples when selecting an appropriate STT tool. Frontiers Media S.A. 2023-09-20 /pmc/articles/PMC10548127/ /pubmed/37799510 http://dx.doi.org/10.3389/fdata.2023.1210559 Text en Copyright © 2023 Ferraro, Galli, La Gatta and Postiglione. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Big Data Ferraro, Antonino Galli, Antonio La Gatta, Valerio Postiglione, Marco Benchmarking open source and paid services for speech to text: an analysis of quality and input variety |
title | Benchmarking open source and paid services for speech to text: an analysis of quality and input variety |
title_full | Benchmarking open source and paid services for speech to text: an analysis of quality and input variety |
title_fullStr | Benchmarking open source and paid services for speech to text: an analysis of quality and input variety |
title_full_unstemmed | Benchmarking open source and paid services for speech to text: an analysis of quality and input variety |
title_short | Benchmarking open source and paid services for speech to text: an analysis of quality and input variety |
title_sort | benchmarking open source and paid services for speech to text: an analysis of quality and input variety |
topic | Big Data |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10548127/ https://www.ncbi.nlm.nih.gov/pubmed/37799510 http://dx.doi.org/10.3389/fdata.2023.1210559 |
work_keys_str_mv | AT ferraroantonino benchmarkingopensourceandpaidservicesforspeechtotextananalysisofqualityandinputvariety AT galliantonio benchmarkingopensourceandpaidservicesforspeechtotextananalysisofqualityandinputvariety AT lagattavalerio benchmarkingopensourceandpaidservicesforspeechtotextananalysisofqualityandinputvariety AT postiglionemarco benchmarkingopensourceandpaidservicesforspeechtotextananalysisofqualityandinputvariety |