Cargando…

Implementing a Statistical Parametric Speech Synthesis System for a Patient with Laryngeal Cancer

Total laryngectomy, i.e., the surgical removal of the larynx, has a profound influence on a patient’s quality of life. The procedure results in a loss of natural voice, which in effect constitutes a significant socio-psychological problem for the patient. The main aim of the study was to develop a s...

Descripción completa

Detalles Bibliográficos
Autores principales:	Szklanny, Krzysztof, Lachowicz, Jakub
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9099606/ https://www.ncbi.nlm.nih.gov/pubmed/35590877 http://dx.doi.org/10.3390/s22093188

_version_	1784706647623467008
author	Szklanny, Krzysztof Lachowicz, Jakub
author_facet	Szklanny, Krzysztof Lachowicz, Jakub
author_sort	Szklanny, Krzysztof
collection	PubMed
description	Total laryngectomy, i.e., the surgical removal of the larynx, has a profound influence on a patient’s quality of life. The procedure results in a loss of natural voice, which in effect constitutes a significant socio-psychological problem for the patient. The main aim of the study was to develop a statistical parametric speech synthesis system for a patient with laryngeal cancer, on the basis of the patient’s speech samples recorded shortly before the surgery and to check if it was possible to generate speech quality close to that of the original recordings. The recording made use of a representative corpus of the Polish language, consisting of 2150 sentences. The recorded voice proved to indicate dysphonia, which was confirmed by the auditory-perceptual RBH scale (roughness, breathiness, hoarseness) and by acoustical analysis using AVQI (The Acoustic Voice Quality Index). The speech synthesis model was trained using the Merlin repository. Twenty-five experts participated in the MUSHRA listening tests, rating the synthetic voice at 69.4 in terms of the professional voice-over talent recording, on a 0–100 scale, which is a very good result. The authors compared the quality of the synthetic voice to another model of synthetic speech trained with the same corpus, but where a voice-over talent provided the recorded speech samples. The same experts rated the voice at 63.63, which means the patient’s synthetic voice with laryngeal cancer obtained a higher score than that of the talent-voice recordings. As such, the method enabled for the creation of a statistical parametric speech synthesizer for patients awaiting total laryngectomy. As a result, the solution would improve the quality of life as well as better mental wellbeing of the patient.
format	Online Article Text
id	pubmed-9099606
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-90996062022-05-14 Implementing a Statistical Parametric Speech Synthesis System for a Patient with Laryngeal Cancer Szklanny, Krzysztof Lachowicz, Jakub Sensors (Basel) Article Total laryngectomy, i.e., the surgical removal of the larynx, has a profound influence on a patient’s quality of life. The procedure results in a loss of natural voice, which in effect constitutes a significant socio-psychological problem for the patient. The main aim of the study was to develop a statistical parametric speech synthesis system for a patient with laryngeal cancer, on the basis of the patient’s speech samples recorded shortly before the surgery and to check if it was possible to generate speech quality close to that of the original recordings. The recording made use of a representative corpus of the Polish language, consisting of 2150 sentences. The recorded voice proved to indicate dysphonia, which was confirmed by the auditory-perceptual RBH scale (roughness, breathiness, hoarseness) and by acoustical analysis using AVQI (The Acoustic Voice Quality Index). The speech synthesis model was trained using the Merlin repository. Twenty-five experts participated in the MUSHRA listening tests, rating the synthetic voice at 69.4 in terms of the professional voice-over talent recording, on a 0–100 scale, which is a very good result. The authors compared the quality of the synthetic voice to another model of synthetic speech trained with the same corpus, but where a voice-over talent provided the recorded speech samples. The same experts rated the voice at 63.63, which means the patient’s synthetic voice with laryngeal cancer obtained a higher score than that of the talent-voice recordings. As such, the method enabled for the creation of a statistical parametric speech synthesizer for patients awaiting total laryngectomy. As a result, the solution would improve the quality of life as well as better mental wellbeing of the patient. MDPI 2022-04-21 /pmc/articles/PMC9099606/ /pubmed/35590877 http://dx.doi.org/10.3390/s22093188 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Szklanny, Krzysztof Lachowicz, Jakub Implementing a Statistical Parametric Speech Synthesis System for a Patient with Laryngeal Cancer
title	Implementing a Statistical Parametric Speech Synthesis System for a Patient with Laryngeal Cancer
title_full	Implementing a Statistical Parametric Speech Synthesis System for a Patient with Laryngeal Cancer
title_fullStr	Implementing a Statistical Parametric Speech Synthesis System for a Patient with Laryngeal Cancer
title_full_unstemmed	Implementing a Statistical Parametric Speech Synthesis System for a Patient with Laryngeal Cancer
title_short	Implementing a Statistical Parametric Speech Synthesis System for a Patient with Laryngeal Cancer
title_sort	implementing a statistical parametric speech synthesis system for a patient with laryngeal cancer
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9099606/ https://www.ncbi.nlm.nih.gov/pubmed/35590877 http://dx.doi.org/10.3390/s22093188
work_keys_str_mv	AT szklannykrzysztof implementingastatisticalparametricspeechsynthesissystemforapatientwithlaryngealcancer AT lachowiczjakub implementingastatisticalparametricspeechsynthesissystemforapatientwithlaryngealcancer

Implementing a Statistical Parametric Speech Synthesis System for a Patient with Laryngeal Cancer

Ejemplares similares