Cargando…

TorchDIVA: An extensible computational model of speech production built on an open-source machine learning library

The DIVA model is a computational model of speech motor control that combines a simulation of the brain regions responsible for speech production with a model of the human vocal tract. The model is currently implemented in Matlab Simulink; however, this is less than ideal as most of the development...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kinahan, Sean P., Liss, Julie M., Berisha, Visar
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9937462/ https://www.ncbi.nlm.nih.gov/pubmed/36800358 http://dx.doi.org/10.1371/journal.pone.0281306

_version_	1784890429306568704
author	Kinahan, Sean P. Liss, Julie M. Berisha, Visar
author_facet	Kinahan, Sean P. Liss, Julie M. Berisha, Visar
author_sort	Kinahan, Sean P.
collection	PubMed
description	The DIVA model is a computational model of speech motor control that combines a simulation of the brain regions responsible for speech production with a model of the human vocal tract. The model is currently implemented in Matlab Simulink; however, this is less than ideal as most of the development in speech technology research is done in Python. This means there is a wealth of machine learning tools which are freely available in the Python ecosystem that cannot be easily integrated with DIVA. We present TorchDIVA, a full rebuild of DIVA in Python using PyTorch tensors. DIVA source code was directly translated from Matlab to Python, and built-in Simulink signal blocks were implemented from scratch. After implementation, the accuracy of each module was evaluated via systematic block-by-block validation. The TorchDIVA model is shown to produce outputs that closely match those of the original DIVA model, with a negligible difference between the two. We additionally present an example of the extensibility of TorchDIVA as a research platform. Speech quality enhancement in TorchDIVA is achieved through an integration with an existing PyTorch generative vocoder called DiffWave. A modified DiffWave mel-spectrum upsampler was trained on human speech waveforms and conditioned on the TorchDIVA speech production. The results indicate improved speech quality metrics in the DiffWave-enhanced output as compared to the baseline. This enhancement would have been difficult or impossible to accomplish in the original Matlab implementation. This proof-of-concept demonstrates the value TorchDIVA can bring to the research community. Researchers can download the new implementation at: https://github.com/skinahan/DIVA_PyTorch.
format	Online Article Text
id	pubmed-9937462
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-99374622023-02-18 TorchDIVA: An extensible computational model of speech production built on an open-source machine learning library Kinahan, Sean P. Liss, Julie M. Berisha, Visar PLoS One Research Article The DIVA model is a computational model of speech motor control that combines a simulation of the brain regions responsible for speech production with a model of the human vocal tract. The model is currently implemented in Matlab Simulink; however, this is less than ideal as most of the development in speech technology research is done in Python. This means there is a wealth of machine learning tools which are freely available in the Python ecosystem that cannot be easily integrated with DIVA. We present TorchDIVA, a full rebuild of DIVA in Python using PyTorch tensors. DIVA source code was directly translated from Matlab to Python, and built-in Simulink signal blocks were implemented from scratch. After implementation, the accuracy of each module was evaluated via systematic block-by-block validation. The TorchDIVA model is shown to produce outputs that closely match those of the original DIVA model, with a negligible difference between the two. We additionally present an example of the extensibility of TorchDIVA as a research platform. Speech quality enhancement in TorchDIVA is achieved through an integration with an existing PyTorch generative vocoder called DiffWave. A modified DiffWave mel-spectrum upsampler was trained on human speech waveforms and conditioned on the TorchDIVA speech production. The results indicate improved speech quality metrics in the DiffWave-enhanced output as compared to the baseline. This enhancement would have been difficult or impossible to accomplish in the original Matlab implementation. This proof-of-concept demonstrates the value TorchDIVA can bring to the research community. Researchers can download the new implementation at: https://github.com/skinahan/DIVA_PyTorch. Public Library of Science 2023-02-17 /pmc/articles/PMC9937462/ /pubmed/36800358 http://dx.doi.org/10.1371/journal.pone.0281306 Text en © 2023 Kinahan et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Kinahan, Sean P. Liss, Julie M. Berisha, Visar TorchDIVA: An extensible computational model of speech production built on an open-source machine learning library
title	TorchDIVA: An extensible computational model of speech production built on an open-source machine learning library
title_full	TorchDIVA: An extensible computational model of speech production built on an open-source machine learning library
title_fullStr	TorchDIVA: An extensible computational model of speech production built on an open-source machine learning library
title_full_unstemmed	TorchDIVA: An extensible computational model of speech production built on an open-source machine learning library
title_short	TorchDIVA: An extensible computational model of speech production built on an open-source machine learning library
title_sort	torchdiva: an extensible computational model of speech production built on an open-source machine learning library
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9937462/ https://www.ncbi.nlm.nih.gov/pubmed/36800358 http://dx.doi.org/10.1371/journal.pone.0281306
work_keys_str_mv	AT kinahanseanp torchdivaanextensiblecomputationalmodelofspeechproductionbuiltonanopensourcemachinelearninglibrary AT lissjuliem torchdivaanextensiblecomputationalmodelofspeechproductionbuiltonanopensourcemachinelearninglibrary AT berishavisar torchdivaanextensiblecomputationalmodelofspeechproductionbuiltonanopensourcemachinelearninglibrary

TorchDIVA: An extensible computational model of speech production built on an open-source machine learning library

Ejemplares similares