Cargando…

Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks

Several researchers have contemplated deep learning-based post-filters to increase the quality of statistical parametric speech synthesis, which perform a mapping of the synthetic speech to the natural speech, considering the different parameters separately and trying to reduce the gap between them....

Descripción completa

Detalles Bibliográficos
Autor principal:	Coto-Jiménez, Marvin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2019
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6630405/ https://www.ncbi.nlm.nih.gov/pubmed/31141924 http://dx.doi.org/10.3390/biomimetics4020039

_version_	1783435294864834560
author	Coto-Jiménez, Marvin
author_facet	Coto-Jiménez, Marvin
author_sort	Coto-Jiménez, Marvin
collection	PubMed
description	Several researchers have contemplated deep learning-based post-filters to increase the quality of statistical parametric speech synthesis, which perform a mapping of the synthetic speech to the natural speech, considering the different parameters separately and trying to reduce the gap between them. The Long Short-term Memory (LSTM) Neural Networks have been applied successfully in this purpose, but there are still many aspects to improve in the results and in the process itself. In this paper, we introduce a new pre-training approach for the LSTM, with the objective of enhancing the quality of the synthesized speech, particularly in the spectrum, in a more efficient manner. Our approach begins with an auto-associative training of one LSTM network, which is used as an initialization for the post-filters. We show the advantages of this initialization for the enhancing of the Mel-Frequency Cepstral parameters of synthetic speech. Results show that the initialization succeeds in achieving better results in enhancing the statistical parametric speech spectrum in most cases when compared to the common random initialization approach of the networks.
format	Online Article Text
id	pubmed-6630405
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-66304052019-08-19 Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks Coto-Jiménez, Marvin Biomimetics (Basel) Article Several researchers have contemplated deep learning-based post-filters to increase the quality of statistical parametric speech synthesis, which perform a mapping of the synthetic speech to the natural speech, considering the different parameters separately and trying to reduce the gap between them. The Long Short-term Memory (LSTM) Neural Networks have been applied successfully in this purpose, but there are still many aspects to improve in the results and in the process itself. In this paper, we introduce a new pre-training approach for the LSTM, with the objective of enhancing the quality of the synthesized speech, particularly in the spectrum, in a more efficient manner. Our approach begins with an auto-associative training of one LSTM network, which is used as an initialization for the post-filters. We show the advantages of this initialization for the enhancing of the Mel-Frequency Cepstral parameters of synthetic speech. Results show that the initialization succeeds in achieving better results in enhancing the statistical parametric speech spectrum in most cases when compared to the common random initialization approach of the networks. MDPI 2019-05-28 /pmc/articles/PMC6630405/ /pubmed/31141924 http://dx.doi.org/10.3390/biomimetics4020039 Text en © 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Coto-Jiménez, Marvin Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks
title	Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks
title_full	Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks
title_fullStr	Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks
title_full_unstemmed	Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks
title_short	Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks
title_sort	improving post-filtering of artificial speech using pre-trained lstm neural networks
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6630405/ https://www.ncbi.nlm.nih.gov/pubmed/31141924 http://dx.doi.org/10.3390/biomimetics4020039
work_keys_str_mv	AT cotojimenezmarvin improvingpostfilteringofartificialspeechusingpretrainedlstmneuralnetworks

Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks

Ejemplares similares