Cargando…

Speaker Adaptation on Articulation and Acoustics for Articulation-to-Speech Synthesis

Silent speech interfaces (SSIs) convert non-audio bio-signals, such as articulatory movement, to speech. This technology has the potential to recover the speech ability of individuals who have lost their voice but can still articulate (e.g., laryngectomees). Articulation-to-speech (ATS) synthesis is...

Descripción completa

Detalles Bibliográficos
Autores principales: Cao, Beiming, Wisler, Alan, Wang, Jun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9416444/
https://www.ncbi.nlm.nih.gov/pubmed/36015817
http://dx.doi.org/10.3390/s22166056
_version_ 1784776481713422336
author Cao, Beiming
Wisler, Alan
Wang, Jun
author_facet Cao, Beiming
Wisler, Alan
Wang, Jun
author_sort Cao, Beiming
collection PubMed
description Silent speech interfaces (SSIs) convert non-audio bio-signals, such as articulatory movement, to speech. This technology has the potential to recover the speech ability of individuals who have lost their voice but can still articulate (e.g., laryngectomees). Articulation-to-speech (ATS) synthesis is an algorithm design of SSI that has the advantages of easy-implementation and low-latency, and therefore is becoming more popular. Current ATS studies focus on speaker-dependent (SD) models to avoid large variations of articulatory patterns and acoustic features across speakers. However, these designs are limited by the small data size from individual speakers. Speaker adaptation designs that include multiple speakers’ data have the potential to address the issue of limited data size from single speakers; however, few prior studies have investigated their performance in ATS. In this paper, we investigated speaker adaptation on both the input articulation and the output acoustic signals (with or without direct inclusion of data from test speakers) using the publicly available electromagnetic articulatory (EMA) dataset. We used Procrustes matching and voice conversion for articulation and voice adaptation, respectively. The performance of the ATS models was measured objectively by the mel-cepstral distortions (MCDs). The synthetic speech samples were generated and are provided in the supplementary material. The results demonstrated the improvement brought by both Procrustes matching and voice conversion on speaker-independent ATS. With the direct inclusion of target speaker data in the training process, the speaker-adaptive ATS achieved a comparable performance to speaker-dependent ATS. To our knowledge, this is the first study that has demonstrated that speaker-adaptive ATS can achieve a non-statistically different performance to speaker-dependent ATS.
format Online
Article
Text
id pubmed-9416444
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-94164442022-08-27 Speaker Adaptation on Articulation and Acoustics for Articulation-to-Speech Synthesis Cao, Beiming Wisler, Alan Wang, Jun Sensors (Basel) Article Silent speech interfaces (SSIs) convert non-audio bio-signals, such as articulatory movement, to speech. This technology has the potential to recover the speech ability of individuals who have lost their voice but can still articulate (e.g., laryngectomees). Articulation-to-speech (ATS) synthesis is an algorithm design of SSI that has the advantages of easy-implementation and low-latency, and therefore is becoming more popular. Current ATS studies focus on speaker-dependent (SD) models to avoid large variations of articulatory patterns and acoustic features across speakers. However, these designs are limited by the small data size from individual speakers. Speaker adaptation designs that include multiple speakers’ data have the potential to address the issue of limited data size from single speakers; however, few prior studies have investigated their performance in ATS. In this paper, we investigated speaker adaptation on both the input articulation and the output acoustic signals (with or without direct inclusion of data from test speakers) using the publicly available electromagnetic articulatory (EMA) dataset. We used Procrustes matching and voice conversion for articulation and voice adaptation, respectively. The performance of the ATS models was measured objectively by the mel-cepstral distortions (MCDs). The synthetic speech samples were generated and are provided in the supplementary material. The results demonstrated the improvement brought by both Procrustes matching and voice conversion on speaker-independent ATS. With the direct inclusion of target speaker data in the training process, the speaker-adaptive ATS achieved a comparable performance to speaker-dependent ATS. To our knowledge, this is the first study that has demonstrated that speaker-adaptive ATS can achieve a non-statistically different performance to speaker-dependent ATS. MDPI 2022-08-13 /pmc/articles/PMC9416444/ /pubmed/36015817 http://dx.doi.org/10.3390/s22166056 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Cao, Beiming
Wisler, Alan
Wang, Jun
Speaker Adaptation on Articulation and Acoustics for Articulation-to-Speech Synthesis
title Speaker Adaptation on Articulation and Acoustics for Articulation-to-Speech Synthesis
title_full Speaker Adaptation on Articulation and Acoustics for Articulation-to-Speech Synthesis
title_fullStr Speaker Adaptation on Articulation and Acoustics for Articulation-to-Speech Synthesis
title_full_unstemmed Speaker Adaptation on Articulation and Acoustics for Articulation-to-Speech Synthesis
title_short Speaker Adaptation on Articulation and Acoustics for Articulation-to-Speech Synthesis
title_sort speaker adaptation on articulation and acoustics for articulation-to-speech synthesis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9416444/
https://www.ncbi.nlm.nih.gov/pubmed/36015817
http://dx.doi.org/10.3390/s22166056
work_keys_str_mv AT caobeiming speakeradaptationonarticulationandacousticsforarticulationtospeechsynthesis
AT wisleralan speakeradaptationonarticulationandacousticsforarticulationtospeechsynthesis
AT wangjun speakeradaptationonarticulationandacousticsforarticulationtospeechsynthesis