Cargando…

The First Vietnamese FOSD-Tacotron-2-based Text-to-Speech Model Dataset

Recent trends in voicebot application development have enabled utilization of both speech-to-text and text-to-speech (TTS) generation techniques. In order to generate a voice response to a given speech, one needs to use a TTS engine. The recently developed TTS engines are shifting towards end-to-end...

Descripción completa

Detalles Bibliográficos
Autor principal:	Tran, Duc Chung
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2020
Materias:	Computer Science
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7287261/ https://www.ncbi.nlm.nih.gov/pubmed/32551347 http://dx.doi.org/10.1016/j.dib.2020.105775

_version_	1783545034226794496
author	Tran, Duc Chung
author_facet	Tran, Duc Chung
author_sort	Tran, Duc Chung
collection	PubMed
description	Recent trends in voicebot application development have enabled utilization of both speech-to-text and text-to-speech (TTS) generation techniques. In order to generate a voice response to a given speech, one needs to use a TTS engine. The recently developed TTS engines are shifting towards end-to-end approaches utilizing models such as Tacotron, Tacotron-2, WaveNet, and WaveGlow. The reason is that it enables a TTS service provider to focus on developing training and validating datasets comprising of labelled texts and recorded speeches instead of designing an entirely new model that outperforms the others which is time-consuming and costly. In this context, this work introduces the first Vietnamese FPT Open Speech Data (FOSD)-Tacotron-2-based TTS model dataset. This dataset comprises of a configuration file in .json format; training and validating text input files (in .csv format); a 225,000-step checkpoint of the trained model; and several sample generated audios. The published dataset is extremely worth for serving as a model for benchmarking with other newly developed TTS models / engines. In addition, it opens an entirely new TTS research optimization problem to be addressed: How to effectively generate speech from text given: a black box TTS (trained) model and its training and validation input texts.
format	Online Article Text
id	pubmed-7287261
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-72872612020-06-17 The First Vietnamese FOSD-Tacotron-2-based Text-to-Speech Model Dataset Tran, Duc Chung Data Brief Computer Science Recent trends in voicebot application development have enabled utilization of both speech-to-text and text-to-speech (TTS) generation techniques. In order to generate a voice response to a given speech, one needs to use a TTS engine. The recently developed TTS engines are shifting towards end-to-end approaches utilizing models such as Tacotron, Tacotron-2, WaveNet, and WaveGlow. The reason is that it enables a TTS service provider to focus on developing training and validating datasets comprising of labelled texts and recorded speeches instead of designing an entirely new model that outperforms the others which is time-consuming and costly. In this context, this work introduces the first Vietnamese FPT Open Speech Data (FOSD)-Tacotron-2-based TTS model dataset. This dataset comprises of a configuration file in .json format; training and validating text input files (in .csv format); a 225,000-step checkpoint of the trained model; and several sample generated audios. The published dataset is extremely worth for serving as a model for benchmarking with other newly developed TTS models / engines. In addition, it opens an entirely new TTS research optimization problem to be addressed: How to effectively generate speech from text given: a black box TTS (trained) model and its training and validation input texts. Elsevier 2020-05-27 /pmc/articles/PMC7287261/ /pubmed/32551347 http://dx.doi.org/10.1016/j.dib.2020.105775 Text en © 2020 The Author(s) http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Computer Science Tran, Duc Chung The First Vietnamese FOSD-Tacotron-2-based Text-to-Speech Model Dataset
title	The First Vietnamese FOSD-Tacotron-2-based Text-to-Speech Model Dataset
title_full	The First Vietnamese FOSD-Tacotron-2-based Text-to-Speech Model Dataset
title_fullStr	The First Vietnamese FOSD-Tacotron-2-based Text-to-Speech Model Dataset
title_full_unstemmed	The First Vietnamese FOSD-Tacotron-2-based Text-to-Speech Model Dataset
title_short	The First Vietnamese FOSD-Tacotron-2-based Text-to-Speech Model Dataset
title_sort	first vietnamese fosd-tacotron-2-based text-to-speech model dataset
topic	Computer Science
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7287261/ https://www.ncbi.nlm.nih.gov/pubmed/32551347 http://dx.doi.org/10.1016/j.dib.2020.105775
work_keys_str_mv	AT tranducchung thefirstvietnamesefosdtacotron2basedtexttospeechmodeldataset AT tranducchung firstvietnamesefosdtacotron2basedtexttospeechmodeldataset

The First Vietnamese FOSD-Tacotron-2-based Text-to-Speech Model Dataset

Ejemplares similares