Cargando…

Few-shot short utterance speaker verification using meta-learning

Short utterance speaker verification (SV) in the actual application is the task of accepting or rejecting the identity claim of a speaker based on a few enrollment utterances. Traditional methods have used deep neural networks to extract speaker representations for verification. Recently, several me...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Weijie, Zhao, Hong, Yang, Yikun, Chang, YouKang, You, Haojie
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2023
Materias:	Artificial Intelligence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280689/ https://www.ncbi.nlm.nih.gov/pubmed/37346533 http://dx.doi.org/10.7717/peerj-cs.1276

_version_	1785060853171617792
author	Wang, Weijie Zhao, Hong Yang, Yikun Chang, YouKang You, Haojie
author_facet	Wang, Weijie Zhao, Hong Yang, Yikun Chang, YouKang You, Haojie
author_sort	Wang, Weijie
collection	PubMed
description	Short utterance speaker verification (SV) in the actual application is the task of accepting or rejecting the identity claim of a speaker based on a few enrollment utterances. Traditional methods have used deep neural networks to extract speaker representations for verification. Recently, several meta-learning approaches have learned a deep distance metric to distinguish speakers within meta-tasks. Among them, a prototypical network learns a metric space that may be used to compute the distance to the prototype center of speakers, in order to classify speaker identity. We use emphasized channel attention, propagation and aggregation in TDNN (ECAPA-TDNN) to implement the necessary function for the prototypical network, which is a nonlinear mapping from the input space to the metric space for either few-shot SV task. In addition, optimizing only for speakers in given meta-tasks cannot be sufficient to learn distinctive speaker features. Thus, we used an episodic training strategy, in which the classes of the support and query sets correspond to the classes of the entire training set, further improving the model performance. The proposed model outperforms comparison models on the VoxCeleb1 dataset and has a wide range of practical applications.
format	Online Article Text
id	pubmed-10280689
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-102806892023-06-21 Few-shot short utterance speaker verification using meta-learning Wang, Weijie Zhao, Hong Yang, Yikun Chang, YouKang You, Haojie PeerJ Comput Sci Artificial Intelligence Short utterance speaker verification (SV) in the actual application is the task of accepting or rejecting the identity claim of a speaker based on a few enrollment utterances. Traditional methods have used deep neural networks to extract speaker representations for verification. Recently, several meta-learning approaches have learned a deep distance metric to distinguish speakers within meta-tasks. Among them, a prototypical network learns a metric space that may be used to compute the distance to the prototype center of speakers, in order to classify speaker identity. We use emphasized channel attention, propagation and aggregation in TDNN (ECAPA-TDNN) to implement the necessary function for the prototypical network, which is a nonlinear mapping from the input space to the metric space for either few-shot SV task. In addition, optimizing only for speakers in given meta-tasks cannot be sufficient to learn distinctive speaker features. Thus, we used an episodic training strategy, in which the classes of the support and query sets correspond to the classes of the entire training set, further improving the model performance. The proposed model outperforms comparison models on the VoxCeleb1 dataset and has a wide range of practical applications. PeerJ Inc. 2023-04-21 /pmc/articles/PMC10280689/ /pubmed/37346533 http://dx.doi.org/10.7717/peerj-cs.1276 Text en ©2023 Wang et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Artificial Intelligence Wang, Weijie Zhao, Hong Yang, Yikun Chang, YouKang You, Haojie Few-shot short utterance speaker verification using meta-learning
title	Few-shot short utterance speaker verification using meta-learning
title_full	Few-shot short utterance speaker verification using meta-learning
title_fullStr	Few-shot short utterance speaker verification using meta-learning
title_full_unstemmed	Few-shot short utterance speaker verification using meta-learning
title_short	Few-shot short utterance speaker verification using meta-learning
title_sort	few-shot short utterance speaker verification using meta-learning
topic	Artificial Intelligence
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280689/ https://www.ncbi.nlm.nih.gov/pubmed/37346533 http://dx.doi.org/10.7717/peerj-cs.1276
work_keys_str_mv	AT wangweijie fewshotshortutterancespeakerverificationusingmetalearning AT zhaohong fewshotshortutterancespeakerverificationusingmetalearning AT yangyikun fewshotshortutterancespeakerverificationusingmetalearning AT changyoukang fewshotshortutterancespeakerverificationusingmetalearning AT youhaojie fewshotshortutterancespeakerverificationusingmetalearning

Few-shot short utterance speaker verification using meta-learning

Ejemplares similares