Cargando…

Adversarially Learned Total Variability Embedding for Speaker Recognition with Random Digit Strings

Over the recent years, various research has been conducted to investigate methods for verifying users with a short randomized pass-phrase due to the increasing demand for voice-based authentication systems. In this paper, we propose a novel technique for extracting an i-vector-like feature based on...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kang, Woo Hyun, Kim, Nam Soo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2019
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6864864/ https://www.ncbi.nlm.nih.gov/pubmed/31671509 http://dx.doi.org/10.3390/s19214709

_version_	1783471978854744064
author	Kang, Woo Hyun Kim, Nam Soo
author_facet	Kang, Woo Hyun Kim, Nam Soo
author_sort	Kang, Woo Hyun
collection	PubMed
description	Over the recent years, various research has been conducted to investigate methods for verifying users with a short randomized pass-phrase due to the increasing demand for voice-based authentication systems. In this paper, we propose a novel technique for extracting an i-vector-like feature based on an adversarially learned inference (ALI) model which summarizes the variability within the Gaussian mixture model (GMM) distribution through a nonlinear process. Analogous to the previously proposed variational autoencoder (VAE)-based feature extractor, the proposed ALI-based model is trained to generate the GMM supervector according to the maximum likelihood criterion given the Baum–Welch statistics of the input utterance. However, to prevent the potential loss of information caused by the Kullback–Leibler divergence (KL divergence) regularization adopted in the VAE-based model training, the newly proposed ALI-based feature extractor exploits a joint discriminator to ensure that the generated latent variable and the GMM supervector are more realistic. The proposed framework is compared with the conventional i-vector and VAE-based methods using the TIDIGITS dataset. Experimental results show that the proposed method can represent the uncertainty caused by the short duration better than the VAE-based method. Furthermore, the proposed approach has shown great performance when applied in association with the standard i-vector framework.
format	Online Article Text
id	pubmed-6864864
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-68648642019-12-06 Adversarially Learned Total Variability Embedding for Speaker Recognition with Random Digit Strings Kang, Woo Hyun Kim, Nam Soo Sensors (Basel) Article Over the recent years, various research has been conducted to investigate methods for verifying users with a short randomized pass-phrase due to the increasing demand for voice-based authentication systems. In this paper, we propose a novel technique for extracting an i-vector-like feature based on an adversarially learned inference (ALI) model which summarizes the variability within the Gaussian mixture model (GMM) distribution through a nonlinear process. Analogous to the previously proposed variational autoencoder (VAE)-based feature extractor, the proposed ALI-based model is trained to generate the GMM supervector according to the maximum likelihood criterion given the Baum–Welch statistics of the input utterance. However, to prevent the potential loss of information caused by the Kullback–Leibler divergence (KL divergence) regularization adopted in the VAE-based model training, the newly proposed ALI-based feature extractor exploits a joint discriminator to ensure that the generated latent variable and the GMM supervector are more realistic. The proposed framework is compared with the conventional i-vector and VAE-based methods using the TIDIGITS dataset. Experimental results show that the proposed method can represent the uncertainty caused by the short duration better than the VAE-based method. Furthermore, the proposed approach has shown great performance when applied in association with the standard i-vector framework. MDPI 2019-10-30 /pmc/articles/PMC6864864/ /pubmed/31671509 http://dx.doi.org/10.3390/s19214709 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Kang, Woo Hyun Kim, Nam Soo Adversarially Learned Total Variability Embedding for Speaker Recognition with Random Digit Strings
title	Adversarially Learned Total Variability Embedding for Speaker Recognition with Random Digit Strings
title_full	Adversarially Learned Total Variability Embedding for Speaker Recognition with Random Digit Strings
title_fullStr	Adversarially Learned Total Variability Embedding for Speaker Recognition with Random Digit Strings
title_full_unstemmed	Adversarially Learned Total Variability Embedding for Speaker Recognition with Random Digit Strings
title_short	Adversarially Learned Total Variability Embedding for Speaker Recognition with Random Digit Strings
title_sort	adversarially learned total variability embedding for speaker recognition with random digit strings
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6864864/ https://www.ncbi.nlm.nih.gov/pubmed/31671509 http://dx.doi.org/10.3390/s19214709
work_keys_str_mv	AT kangwoohyun adversariallylearnedtotalvariabilityembeddingforspeakerrecognitionwithrandomdigitstrings AT kimnamsoo adversariallylearnedtotalvariabilityembeddingforspeakerrecognitionwithrandomdigitstrings

Adversarially Learned Total Variability Embedding for Speaker Recognition with Random Digit Strings

Ejemplares similares