Cargando…
Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias
This paper describes a new unsupervised machine-learning method for simultaneous phoneme and word discovery from multiple speakers. Phoneme and word discovery from multiple speakers is a more challenging problem than that from one speaker, because the speech signals from different speakers exhibit d...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805918/ https://www.ncbi.nlm.nih.gov/pubmed/33501107 http://dx.doi.org/10.3389/frobt.2019.00092 |
_version_ | 1783636412351905792 |
---|---|
author | Nakashima, Ryo Ozaki, Ryo Taniguchi, Tadahiro |
author_facet | Nakashima, Ryo Ozaki, Ryo Taniguchi, Tadahiro |
author_sort | Nakashima, Ryo |
collection | PubMed |
description | This paper describes a new unsupervised machine-learning method for simultaneous phoneme and word discovery from multiple speakers. Phoneme and word discovery from multiple speakers is a more challenging problem than that from one speaker, because the speech signals from different speakers exhibit different acoustic features. The existing method, a nonparametric Bayesian double articulation analyzer (NPB-DAA) with deep sparse autoencoder (DSAE) only performed phoneme and word discovery from a single speaker. Extending NPB-DAA with DSAE to a multi-speaker scenario is, therefore, the research problem of this paper.This paper proposes the employment of a DSAE with parametric bias in the hidden layer (DSAE-PBHL) as a feature extractor for unsupervised phoneme and word discovery. DSAE-PBHL is designed to subtract speaker-dependent acoustic features and speaker-independent features by introducing parametric bias input to the DSAE hidden layer. An experiment demonstrated that DSAE-PBHL could subtract distributed representations of acoustic signals, enabling extraction based on the types of phonemes rather than the speakers. Another experiment demonstrated that a combination of NPB-DAA and DSAE-PBHL outperformed other available methods accomplishing phoneme and word discovery tasks involving speech signals with Japanese vowel sequences from multiple speakers. |
format | Online Article Text |
id | pubmed-7805918 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-78059182021-01-25 Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias Nakashima, Ryo Ozaki, Ryo Taniguchi, Tadahiro Front Robot AI Robotics and AI This paper describes a new unsupervised machine-learning method for simultaneous phoneme and word discovery from multiple speakers. Phoneme and word discovery from multiple speakers is a more challenging problem than that from one speaker, because the speech signals from different speakers exhibit different acoustic features. The existing method, a nonparametric Bayesian double articulation analyzer (NPB-DAA) with deep sparse autoencoder (DSAE) only performed phoneme and word discovery from a single speaker. Extending NPB-DAA with DSAE to a multi-speaker scenario is, therefore, the research problem of this paper.This paper proposes the employment of a DSAE with parametric bias in the hidden layer (DSAE-PBHL) as a feature extractor for unsupervised phoneme and word discovery. DSAE-PBHL is designed to subtract speaker-dependent acoustic features and speaker-independent features by introducing parametric bias input to the DSAE hidden layer. An experiment demonstrated that DSAE-PBHL could subtract distributed representations of acoustic signals, enabling extraction based on the types of phonemes rather than the speakers. Another experiment demonstrated that a combination of NPB-DAA and DSAE-PBHL outperformed other available methods accomplishing phoneme and word discovery tasks involving speech signals with Japanese vowel sequences from multiple speakers. Frontiers Media S.A. 2019-10-01 /pmc/articles/PMC7805918/ /pubmed/33501107 http://dx.doi.org/10.3389/frobt.2019.00092 Text en Copyright © 2019 Nakashima, Ozaki and Taniguchi. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Robotics and AI Nakashima, Ryo Ozaki, Ryo Taniguchi, Tadahiro Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias |
title | Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias |
title_full | Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias |
title_fullStr | Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias |
title_full_unstemmed | Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias |
title_short | Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias |
title_sort | unsupervised phoneme and word discovery from multiple speakers using double articulation analyzer and neural network with parametric bias |
topic | Robotics and AI |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805918/ https://www.ncbi.nlm.nih.gov/pubmed/33501107 http://dx.doi.org/10.3389/frobt.2019.00092 |
work_keys_str_mv | AT nakashimaryo unsupervisedphonemeandworddiscoveryfrommultiplespeakersusingdoublearticulationanalyzerandneuralnetworkwithparametricbias AT ozakiryo unsupervisedphonemeandworddiscoveryfrommultiplespeakersusingdoublearticulationanalyzerandneuralnetworkwithparametricbias AT taniguchitadahiro unsupervisedphonemeandworddiscoveryfrommultiplespeakersusingdoublearticulationanalyzerandneuralnetworkwithparametricbias |