Cargando…

Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task

Articulatory synthesis is one of the approaches used for modeling human speech production. In this study, we propose a model-based algorithm for learning the policy to control the vocal tract of the articulatory synthesizer in a vowel-to-vowel imitation task. Our method does not require external tra...

Descripción completa

Detalles Bibliográficos
Autores principales: Shitov, Denis, Pirogova, Elena, Wysocki, Tadeusz A., Lech, Margaret
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10147087/
https://www.ncbi.nlm.nih.gov/pubmed/37050496
http://dx.doi.org/10.3390/s23073437
_version_ 1785034732250071040
author Shitov, Denis
Pirogova, Elena
Wysocki, Tadeusz A.
Lech, Margaret
author_facet Shitov, Denis
Pirogova, Elena
Wysocki, Tadeusz A.
Lech, Margaret
author_sort Shitov, Denis
collection PubMed
description Articulatory synthesis is one of the approaches used for modeling human speech production. In this study, we propose a model-based algorithm for learning the policy to control the vocal tract of the articulatory synthesizer in a vowel-to-vowel imitation task. Our method does not require external training data, since the policy is learned through interactions with the vocal tract model. To improve the sample efficiency of the learning, we trained the model of speech production dynamics simultaneously with the policy. The policy was trained in a supervised way using predictions of the model of speech production dynamics. To stabilize the training, early stopping was incorporated into the algorithm. Additionally, we extracted acoustic features using an acoustic word embedding (AWE) model. This model was trained to discriminate between different words and to enable compact encoding of acoustics while preserving contextual information of the input. Our preliminary experiments showed that introducing this AWE model was crucial to guide the policy toward a near-optimal solution. The acoustic embeddings, obtained using the proposed approach, were revealed to be useful when applied as inputs to the policy and the model of speech production dynamics.
format Online
Article
Text
id pubmed-10147087
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-101470872023-04-29 Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task Shitov, Denis Pirogova, Elena Wysocki, Tadeusz A. Lech, Margaret Sensors (Basel) Article Articulatory synthesis is one of the approaches used for modeling human speech production. In this study, we propose a model-based algorithm for learning the policy to control the vocal tract of the articulatory synthesizer in a vowel-to-vowel imitation task. Our method does not require external training data, since the policy is learned through interactions with the vocal tract model. To improve the sample efficiency of the learning, we trained the model of speech production dynamics simultaneously with the policy. The policy was trained in a supervised way using predictions of the model of speech production dynamics. To stabilize the training, early stopping was incorporated into the algorithm. Additionally, we extracted acoustic features using an acoustic word embedding (AWE) model. This model was trained to discriminate between different words and to enable compact encoding of acoustics while preserving contextual information of the input. Our preliminary experiments showed that introducing this AWE model was crucial to guide the policy toward a near-optimal solution. The acoustic embeddings, obtained using the proposed approach, were revealed to be useful when applied as inputs to the policy and the model of speech production dynamics. MDPI 2023-03-24 /pmc/articles/PMC10147087/ /pubmed/37050496 http://dx.doi.org/10.3390/s23073437 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Shitov, Denis
Pirogova, Elena
Wysocki, Tadeusz A.
Lech, Margaret
Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task
title Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task
title_full Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task
title_fullStr Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task
title_full_unstemmed Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task
title_short Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task
title_sort deep reinforcement learning for articulatory synthesis in a vowel-to-vowel imitation task
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10147087/
https://www.ncbi.nlm.nih.gov/pubmed/37050496
http://dx.doi.org/10.3390/s23073437
work_keys_str_mv AT shitovdenis deepreinforcementlearningforarticulatorysynthesisinavoweltovowelimitationtask
AT pirogovaelena deepreinforcementlearningforarticulatorysynthesisinavoweltovowelimitationtask
AT wysockitadeusza deepreinforcementlearningforarticulatorysynthesisinavoweltovowelimitationtask
AT lechmargaret deepreinforcementlearningforarticulatorysynthesisinavoweltovowelimitationtask