Cargando…

Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias

This paper describes a new unsupervised machine-learning method for simultaneous phoneme and word discovery from multiple speakers. Phoneme and word discovery from multiple speakers is a more challenging problem than that from one speaker, because the speech signals from different speakers exhibit d...

Descripción completa

Detalles Bibliográficos
Autores principales: Nakashima, Ryo, Ozaki, Ryo, Taniguchi, Tadahiro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805918/
https://www.ncbi.nlm.nih.gov/pubmed/33501107
http://dx.doi.org/10.3389/frobt.2019.00092
_version_ 1783636412351905792
author Nakashima, Ryo
Ozaki, Ryo
Taniguchi, Tadahiro
author_facet Nakashima, Ryo
Ozaki, Ryo
Taniguchi, Tadahiro
author_sort Nakashima, Ryo
collection PubMed
description This paper describes a new unsupervised machine-learning method for simultaneous phoneme and word discovery from multiple speakers. Phoneme and word discovery from multiple speakers is a more challenging problem than that from one speaker, because the speech signals from different speakers exhibit different acoustic features. The existing method, a nonparametric Bayesian double articulation analyzer (NPB-DAA) with deep sparse autoencoder (DSAE) only performed phoneme and word discovery from a single speaker. Extending NPB-DAA with DSAE to a multi-speaker scenario is, therefore, the research problem of this paper.This paper proposes the employment of a DSAE with parametric bias in the hidden layer (DSAE-PBHL) as a feature extractor for unsupervised phoneme and word discovery. DSAE-PBHL is designed to subtract speaker-dependent acoustic features and speaker-independent features by introducing parametric bias input to the DSAE hidden layer. An experiment demonstrated that DSAE-PBHL could subtract distributed representations of acoustic signals, enabling extraction based on the types of phonemes rather than the speakers. Another experiment demonstrated that a combination of NPB-DAA and DSAE-PBHL outperformed other available methods accomplishing phoneme and word discovery tasks involving speech signals with Japanese vowel sequences from multiple speakers.
format Online
Article
Text
id pubmed-7805918
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-78059182021-01-25 Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias Nakashima, Ryo Ozaki, Ryo Taniguchi, Tadahiro Front Robot AI Robotics and AI This paper describes a new unsupervised machine-learning method for simultaneous phoneme and word discovery from multiple speakers. Phoneme and word discovery from multiple speakers is a more challenging problem than that from one speaker, because the speech signals from different speakers exhibit different acoustic features. The existing method, a nonparametric Bayesian double articulation analyzer (NPB-DAA) with deep sparse autoencoder (DSAE) only performed phoneme and word discovery from a single speaker. Extending NPB-DAA with DSAE to a multi-speaker scenario is, therefore, the research problem of this paper.This paper proposes the employment of a DSAE with parametric bias in the hidden layer (DSAE-PBHL) as a feature extractor for unsupervised phoneme and word discovery. DSAE-PBHL is designed to subtract speaker-dependent acoustic features and speaker-independent features by introducing parametric bias input to the DSAE hidden layer. An experiment demonstrated that DSAE-PBHL could subtract distributed representations of acoustic signals, enabling extraction based on the types of phonemes rather than the speakers. Another experiment demonstrated that a combination of NPB-DAA and DSAE-PBHL outperformed other available methods accomplishing phoneme and word discovery tasks involving speech signals with Japanese vowel sequences from multiple speakers. Frontiers Media S.A. 2019-10-01 /pmc/articles/PMC7805918/ /pubmed/33501107 http://dx.doi.org/10.3389/frobt.2019.00092 Text en Copyright © 2019 Nakashima, Ozaki and Taniguchi. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Robotics and AI
Nakashima, Ryo
Ozaki, Ryo
Taniguchi, Tadahiro
Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias
title Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias
title_full Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias
title_fullStr Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias
title_full_unstemmed Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias
title_short Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias
title_sort unsupervised phoneme and word discovery from multiple speakers using double articulation analyzer and neural network with parametric bias
topic Robotics and AI
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805918/
https://www.ncbi.nlm.nih.gov/pubmed/33501107
http://dx.doi.org/10.3389/frobt.2019.00092
work_keys_str_mv AT nakashimaryo unsupervisedphonemeandworddiscoveryfrommultiplespeakersusingdoublearticulationanalyzerandneuralnetworkwithparametricbias
AT ozakiryo unsupervisedphonemeandworddiscoveryfrommultiplespeakersusingdoublearticulationanalyzerandneuralnetworkwithparametricbias
AT taniguchitadahiro unsupervisedphonemeandworddiscoveryfrommultiplespeakersusingdoublearticulationanalyzerandneuralnetworkwithparametricbias