Cargando…

Bidirectional Attention for Text-Dependent Speaker Verification

Automatic speaker verification provides a flexible and effective way for biometric authentication. Previous deep learning-based methods have demonstrated promising results, whereas a few problems still require better solutions. In prior works examining speaker discriminative neural networks, the spe...

Descripción completa

Detalles Bibliográficos
Autores principales:	Fang, Xin, Gao, Tian, Zou, Liang, Ling, Zhenhua
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7730222/ https://www.ncbi.nlm.nih.gov/pubmed/33261046 http://dx.doi.org/10.3390/s20236784

_version_	1783621633737490432
author	Fang, Xin Gao, Tian Zou, Liang Ling, Zhenhua
author_facet	Fang, Xin Gao, Tian Zou, Liang Ling, Zhenhua
author_sort	Fang, Xin
collection	PubMed
description	Automatic speaker verification provides a flexible and effective way for biometric authentication. Previous deep learning-based methods have demonstrated promising results, whereas a few problems still require better solutions. In prior works examining speaker discriminative neural networks, the speaker representation of the target speaker is regarded as a fixed one when comparing with utterances from different speakers, and the joint information between enrollment and evaluation utterances is ignored. In this paper, we propose to combine CNN-based feature learning with a bidirectional attention mechanism to achieve better performance with only one enrollment utterance. The evaluation-enrollment joint information is exploited to provide interactive features through bidirectional attention. In addition, we introduce one individual cost function to identify the phonetic contents, which contributes to calculating the attention score more specifically. These interactive features are complementary to the constant ones, which are extracted from individual speakers separately and do not vary with the evaluation utterances. The proposed method archived a competitive equal error rate of 6.26% on the internal “DAN DAN NI HAO” benchmark dataset with 1250 utterances and outperformed various baseline methods, including the traditional i-vector/PLDA, d-vector, self-attention, and sequence-to-sequence attention models.
format	Online Article Text
id	pubmed-7730222
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-77302222020-12-12 Bidirectional Attention for Text-Dependent Speaker Verification Fang, Xin Gao, Tian Zou, Liang Ling, Zhenhua Sensors (Basel) Article Automatic speaker verification provides a flexible and effective way for biometric authentication. Previous deep learning-based methods have demonstrated promising results, whereas a few problems still require better solutions. In prior works examining speaker discriminative neural networks, the speaker representation of the target speaker is regarded as a fixed one when comparing with utterances from different speakers, and the joint information between enrollment and evaluation utterances is ignored. In this paper, we propose to combine CNN-based feature learning with a bidirectional attention mechanism to achieve better performance with only one enrollment utterance. The evaluation-enrollment joint information is exploited to provide interactive features through bidirectional attention. In addition, we introduce one individual cost function to identify the phonetic contents, which contributes to calculating the attention score more specifically. These interactive features are complementary to the constant ones, which are extracted from individual speakers separately and do not vary with the evaluation utterances. The proposed method archived a competitive equal error rate of 6.26% on the internal “DAN DAN NI HAO” benchmark dataset with 1250 utterances and outperformed various baseline methods, including the traditional i-vector/PLDA, d-vector, self-attention, and sequence-to-sequence attention models. MDPI 2020-11-27 /pmc/articles/PMC7730222/ /pubmed/33261046 http://dx.doi.org/10.3390/s20236784 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Fang, Xin Gao, Tian Zou, Liang Ling, Zhenhua Bidirectional Attention for Text-Dependent Speaker Verification
title	Bidirectional Attention for Text-Dependent Speaker Verification
title_full	Bidirectional Attention for Text-Dependent Speaker Verification
title_fullStr	Bidirectional Attention for Text-Dependent Speaker Verification
title_full_unstemmed	Bidirectional Attention for Text-Dependent Speaker Verification
title_short	Bidirectional Attention for Text-Dependent Speaker Verification
title_sort	bidirectional attention for text-dependent speaker verification
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7730222/ https://www.ncbi.nlm.nih.gov/pubmed/33261046 http://dx.doi.org/10.3390/s20236784
work_keys_str_mv	AT fangxin bidirectionalattentionfortextdependentspeakerverification AT gaotian bidirectionalattentionfortextdependentspeakerverification AT zouliang bidirectionalattentionfortextdependentspeakerverification AT lingzhenhua bidirectionalattentionfortextdependentspeakerverification

Bidirectional Attention for Text-Dependent Speaker Verification

Ejemplares similares