Cargando…
Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task
Articulatory synthesis is one of the approaches used for modeling human speech production. In this study, we propose a model-based algorithm for learning the policy to control the vocal tract of the articulatory synthesizer in a vowel-to-vowel imitation task. Our method does not require external tra...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10147087/ https://www.ncbi.nlm.nih.gov/pubmed/37050496 http://dx.doi.org/10.3390/s23073437 |
_version_ | 1785034732250071040 |
---|---|
author | Shitov, Denis Pirogova, Elena Wysocki, Tadeusz A. Lech, Margaret |
author_facet | Shitov, Denis Pirogova, Elena Wysocki, Tadeusz A. Lech, Margaret |
author_sort | Shitov, Denis |
collection | PubMed |
description | Articulatory synthesis is one of the approaches used for modeling human speech production. In this study, we propose a model-based algorithm for learning the policy to control the vocal tract of the articulatory synthesizer in a vowel-to-vowel imitation task. Our method does not require external training data, since the policy is learned through interactions with the vocal tract model. To improve the sample efficiency of the learning, we trained the model of speech production dynamics simultaneously with the policy. The policy was trained in a supervised way using predictions of the model of speech production dynamics. To stabilize the training, early stopping was incorporated into the algorithm. Additionally, we extracted acoustic features using an acoustic word embedding (AWE) model. This model was trained to discriminate between different words and to enable compact encoding of acoustics while preserving contextual information of the input. Our preliminary experiments showed that introducing this AWE model was crucial to guide the policy toward a near-optimal solution. The acoustic embeddings, obtained using the proposed approach, were revealed to be useful when applied as inputs to the policy and the model of speech production dynamics. |
format | Online Article Text |
id | pubmed-10147087 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-101470872023-04-29 Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task Shitov, Denis Pirogova, Elena Wysocki, Tadeusz A. Lech, Margaret Sensors (Basel) Article Articulatory synthesis is one of the approaches used for modeling human speech production. In this study, we propose a model-based algorithm for learning the policy to control the vocal tract of the articulatory synthesizer in a vowel-to-vowel imitation task. Our method does not require external training data, since the policy is learned through interactions with the vocal tract model. To improve the sample efficiency of the learning, we trained the model of speech production dynamics simultaneously with the policy. The policy was trained in a supervised way using predictions of the model of speech production dynamics. To stabilize the training, early stopping was incorporated into the algorithm. Additionally, we extracted acoustic features using an acoustic word embedding (AWE) model. This model was trained to discriminate between different words and to enable compact encoding of acoustics while preserving contextual information of the input. Our preliminary experiments showed that introducing this AWE model was crucial to guide the policy toward a near-optimal solution. The acoustic embeddings, obtained using the proposed approach, were revealed to be useful when applied as inputs to the policy and the model of speech production dynamics. MDPI 2023-03-24 /pmc/articles/PMC10147087/ /pubmed/37050496 http://dx.doi.org/10.3390/s23073437 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Shitov, Denis Pirogova, Elena Wysocki, Tadeusz A. Lech, Margaret Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task |
title | Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task |
title_full | Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task |
title_fullStr | Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task |
title_full_unstemmed | Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task |
title_short | Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task |
title_sort | deep reinforcement learning for articulatory synthesis in a vowel-to-vowel imitation task |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10147087/ https://www.ncbi.nlm.nih.gov/pubmed/37050496 http://dx.doi.org/10.3390/s23073437 |
work_keys_str_mv | AT shitovdenis deepreinforcementlearningforarticulatorysynthesisinavoweltovowelimitationtask AT pirogovaelena deepreinforcementlearningforarticulatorysynthesisinavoweltovowelimitationtask AT wysockitadeusza deepreinforcementlearningforarticulatorysynthesisinavoweltovowelimitationtask AT lechmargaret deepreinforcementlearningforarticulatorysynthesisinavoweltovowelimitationtask |