Cargando…
Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias
This paper describes a new unsupervised machine-learning method for simultaneous phoneme and word discovery from multiple speakers. Phoneme and word discovery from multiple speakers is a more challenging problem than that from one speaker, because the speech signals from different speakers exhibit d...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805918/ https://www.ncbi.nlm.nih.gov/pubmed/33501107 http://dx.doi.org/10.3389/frobt.2019.00092 |
Sumario: | This paper describes a new unsupervised machine-learning method for simultaneous phoneme and word discovery from multiple speakers. Phoneme and word discovery from multiple speakers is a more challenging problem than that from one speaker, because the speech signals from different speakers exhibit different acoustic features. The existing method, a nonparametric Bayesian double articulation analyzer (NPB-DAA) with deep sparse autoencoder (DSAE) only performed phoneme and word discovery from a single speaker. Extending NPB-DAA with DSAE to a multi-speaker scenario is, therefore, the research problem of this paper.This paper proposes the employment of a DSAE with parametric bias in the hidden layer (DSAE-PBHL) as a feature extractor for unsupervised phoneme and word discovery. DSAE-PBHL is designed to subtract speaker-dependent acoustic features and speaker-independent features by introducing parametric bias input to the DSAE hidden layer. An experiment demonstrated that DSAE-PBHL could subtract distributed representations of acoustic signals, enabling extraction based on the types of phonemes rather than the speakers. Another experiment demonstrated that a combination of NPB-DAA and DSAE-PBHL outperformed other available methods accomplishing phoneme and word discovery tasks involving speech signals with Japanese vowel sequences from multiple speakers. |
---|