Cargando…

Bidirectional Attention for Text-Dependent Speaker Verification

Automatic speaker verification provides a flexible and effective way for biometric authentication. Previous deep learning-based methods have demonstrated promising results, whereas a few problems still require better solutions. In prior works examining speaker discriminative neural networks, the spe...

Descripción completa

Detalles Bibliográficos
Autores principales: Fang, Xin, Gao, Tian, Zou, Liang, Ling, Zhenhua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7730222/
https://www.ncbi.nlm.nih.gov/pubmed/33261046
http://dx.doi.org/10.3390/s20236784
_version_ 1783621633737490432
author Fang, Xin
Gao, Tian
Zou, Liang
Ling, Zhenhua
author_facet Fang, Xin
Gao, Tian
Zou, Liang
Ling, Zhenhua
author_sort Fang, Xin
collection PubMed
description Automatic speaker verification provides a flexible and effective way for biometric authentication. Previous deep learning-based methods have demonstrated promising results, whereas a few problems still require better solutions. In prior works examining speaker discriminative neural networks, the speaker representation of the target speaker is regarded as a fixed one when comparing with utterances from different speakers, and the joint information between enrollment and evaluation utterances is ignored. In this paper, we propose to combine CNN-based feature learning with a bidirectional attention mechanism to achieve better performance with only one enrollment utterance. The evaluation-enrollment joint information is exploited to provide interactive features through bidirectional attention. In addition, we introduce one individual cost function to identify the phonetic contents, which contributes to calculating the attention score more specifically. These interactive features are complementary to the constant ones, which are extracted from individual speakers separately and do not vary with the evaluation utterances. The proposed method archived a competitive equal error rate of 6.26% on the internal “DAN DAN NI HAO” benchmark dataset with 1250 utterances and outperformed various baseline methods, including the traditional i-vector/PLDA, d-vector, self-attention, and sequence-to-sequence attention models.
format Online
Article
Text
id pubmed-7730222
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-77302222020-12-12 Bidirectional Attention for Text-Dependent Speaker Verification Fang, Xin Gao, Tian Zou, Liang Ling, Zhenhua Sensors (Basel) Article Automatic speaker verification provides a flexible and effective way for biometric authentication. Previous deep learning-based methods have demonstrated promising results, whereas a few problems still require better solutions. In prior works examining speaker discriminative neural networks, the speaker representation of the target speaker is regarded as a fixed one when comparing with utterances from different speakers, and the joint information between enrollment and evaluation utterances is ignored. In this paper, we propose to combine CNN-based feature learning with a bidirectional attention mechanism to achieve better performance with only one enrollment utterance. The evaluation-enrollment joint information is exploited to provide interactive features through bidirectional attention. In addition, we introduce one individual cost function to identify the phonetic contents, which contributes to calculating the attention score more specifically. These interactive features are complementary to the constant ones, which are extracted from individual speakers separately and do not vary with the evaluation utterances. The proposed method archived a competitive equal error rate of 6.26% on the internal “DAN DAN NI HAO” benchmark dataset with 1250 utterances and outperformed various baseline methods, including the traditional i-vector/PLDA, d-vector, self-attention, and sequence-to-sequence attention models. MDPI 2020-11-27 /pmc/articles/PMC7730222/ /pubmed/33261046 http://dx.doi.org/10.3390/s20236784 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Fang, Xin
Gao, Tian
Zou, Liang
Ling, Zhenhua
Bidirectional Attention for Text-Dependent Speaker Verification
title Bidirectional Attention for Text-Dependent Speaker Verification
title_full Bidirectional Attention for Text-Dependent Speaker Verification
title_fullStr Bidirectional Attention for Text-Dependent Speaker Verification
title_full_unstemmed Bidirectional Attention for Text-Dependent Speaker Verification
title_short Bidirectional Attention for Text-Dependent Speaker Verification
title_sort bidirectional attention for text-dependent speaker verification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7730222/
https://www.ncbi.nlm.nih.gov/pubmed/33261046
http://dx.doi.org/10.3390/s20236784
work_keys_str_mv AT fangxin bidirectionalattentionfortextdependentspeakerverification
AT gaotian bidirectionalattentionfortextdependentspeakerverification
AT zouliang bidirectionalattentionfortextdependentspeakerverification
AT lingzhenhua bidirectionalattentionfortextdependentspeakerverification