Cargando…

Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks

Training deep neural networks on well-understood dependencies in speech data can provide new insights into how they learn internal representations. This paper argues that acquisition of speech can be modeled as a dependency between random space and generated speech data in the Generative Adversarial...

Descripción completa

Detalles Bibliográficos
Autor principal:	Beguš, Gašper
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2020
Materias:	Artificial Intelligence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7861218/ https://www.ncbi.nlm.nih.gov/pubmed/33733161 http://dx.doi.org/10.3389/frai.2020.00044

_version_	1783647037596631040
author	Beguš, Gašper
author_facet	Beguš, Gašper
author_sort	Beguš, Gašper
collection	PubMed
description	Training deep neural networks on well-understood dependencies in speech data can provide new insights into how they learn internal representations. This paper argues that acquisition of speech can be modeled as a dependency between random space and generated speech data in the Generative Adversarial Network architecture and proposes a methodology to uncover the network's internal representations that correspond to phonetic and phonological properties. The Generative Adversarial architecture is uniquely appropriate for modeling phonetic and phonological learning because the network is trained on unannotated raw acoustic data and learning is unsupervised without any language-specific assumptions or pre-assumed levels of abstraction. A Generative Adversarial Network was trained on an allophonic distribution in English, in which voiceless stops surface as aspirated word-initially before stressed vowels, except if preceded by a sibilant [s]. The network successfully learns the allophonic alternation: the network's generated speech signal contains the conditional distribution of aspiration duration. The paper proposes a technique for establishing the network's internal representations that identifies latent variables that correspond to, for example, presence of [s] and its spectral properties. By manipulating these variables, we actively control the presence of [s] and its frication amplitude in the generated outputs. This suggests that the network learns to use latent variables as an approximation of phonetic and phonological representations. Crucially, we observe that the dependencies learned in training extend beyond the training interval, which allows for additional exploration of learning representations. The paper also discusses how the network's architecture and innovative outputs resemble and differ from linguistic behavior in language acquisition, speech disorders, and speech errors, and how well-understood dependencies in speech data can help us interpret how neural networks learn their representations.
format	Online Article Text
id	pubmed-7861218
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-78612182021-03-16 Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks Beguš, Gašper Front Artif Intell Artificial Intelligence Training deep neural networks on well-understood dependencies in speech data can provide new insights into how they learn internal representations. This paper argues that acquisition of speech can be modeled as a dependency between random space and generated speech data in the Generative Adversarial Network architecture and proposes a methodology to uncover the network's internal representations that correspond to phonetic and phonological properties. The Generative Adversarial architecture is uniquely appropriate for modeling phonetic and phonological learning because the network is trained on unannotated raw acoustic data and learning is unsupervised without any language-specific assumptions or pre-assumed levels of abstraction. A Generative Adversarial Network was trained on an allophonic distribution in English, in which voiceless stops surface as aspirated word-initially before stressed vowels, except if preceded by a sibilant [s]. The network successfully learns the allophonic alternation: the network's generated speech signal contains the conditional distribution of aspiration duration. The paper proposes a technique for establishing the network's internal representations that identifies latent variables that correspond to, for example, presence of [s] and its spectral properties. By manipulating these variables, we actively control the presence of [s] and its frication amplitude in the generated outputs. This suggests that the network learns to use latent variables as an approximation of phonetic and phonological representations. Crucially, we observe that the dependencies learned in training extend beyond the training interval, which allows for additional exploration of learning representations. The paper also discusses how the network's architecture and innovative outputs resemble and differ from linguistic behavior in language acquisition, speech disorders, and speech errors, and how well-understood dependencies in speech data can help us interpret how neural networks learn their representations. Frontiers Media S.A. 2020-07-08 /pmc/articles/PMC7861218/ /pubmed/33733161 http://dx.doi.org/10.3389/frai.2020.00044 Text en Copyright © 2020 Beguš. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Artificial Intelligence Beguš, Gašper Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks
title	Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks
title_full	Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks
title_fullStr	Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks
title_full_unstemmed	Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks
title_short	Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks
title_sort	generative adversarial phonology: modeling unsupervised phonetic and phonological learning with neural networks
topic	Artificial Intelligence
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7861218/ https://www.ncbi.nlm.nih.gov/pubmed/33733161 http://dx.doi.org/10.3389/frai.2020.00044
work_keys_str_mv	AT begusgasper generativeadversarialphonologymodelingunsupervisedphoneticandphonologicallearningwithneuralnetworks

Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks

Ejemplares similares