Cargando…

Short-time speaker verification with different speaking style utterances

In recent years, great progress has been made in the technical aspects of automatic speaker verification (ASV). However, the promotion of ASV technology is still a very challenging issue, because most technologies are still very sensitive to new, unknown and spoofing conditions. Most previous studie...

Descripción completa

Detalles Bibliográficos
Autores principales: Mao, Hongwei, Shi, Yan, Liu, Yue, Wei, Linqiang, Li, Yijie, Long, Yanhua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7657545/
https://www.ncbi.nlm.nih.gov/pubmed/33175898
http://dx.doi.org/10.1371/journal.pone.0241809
_version_ 1783608525752107008
author Mao, Hongwei
Shi, Yan
Liu, Yue
Wei, Linqiang
Li, Yijie
Long, Yanhua
author_facet Mao, Hongwei
Shi, Yan
Liu, Yue
Wei, Linqiang
Li, Yijie
Long, Yanhua
author_sort Mao, Hongwei
collection PubMed
description In recent years, great progress has been made in the technical aspects of automatic speaker verification (ASV). However, the promotion of ASV technology is still a very challenging issue, because most technologies are still very sensitive to new, unknown and spoofing conditions. Most previous studies focused on extracting target speaker information from natural speech. This paper aims to design a new ASV corpus with multi-speaking styles and investigate the ASV robustness to these different speaking styles. We first release this corpus in the Zenodo website for public research, in which each speaker has several text-dependent and text-independent singing, humming and normal reading speech utterances. Then, we investigate the speaker discrimination of each speaking style in the feature space. Furthermore, the intra and inter-speaker variabilities in each different speaking style and cross-speaking styles are investigated in both text-dependent and text-independent ASV tasks. Conventional Gaussian Mixture Model (GMM), and the state-of-the-art x-vector are used to build ASV systems. Experimental results show that the voiceprint information in humming and singing speech are more distinguishable than that in normal reading speech for conventional ASV systems. Furthermore, we find that combing the three speaking styles can significantly improve the x-vector based ASV system, even when only limited gains are obtained by conventional GMM-based systems.
format Online
Article
Text
id pubmed-7657545
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-76575452020-11-18 Short-time speaker verification with different speaking style utterances Mao, Hongwei Shi, Yan Liu, Yue Wei, Linqiang Li, Yijie Long, Yanhua PLoS One Research Article In recent years, great progress has been made in the technical aspects of automatic speaker verification (ASV). However, the promotion of ASV technology is still a very challenging issue, because most technologies are still very sensitive to new, unknown and spoofing conditions. Most previous studies focused on extracting target speaker information from natural speech. This paper aims to design a new ASV corpus with multi-speaking styles and investigate the ASV robustness to these different speaking styles. We first release this corpus in the Zenodo website for public research, in which each speaker has several text-dependent and text-independent singing, humming and normal reading speech utterances. Then, we investigate the speaker discrimination of each speaking style in the feature space. Furthermore, the intra and inter-speaker variabilities in each different speaking style and cross-speaking styles are investigated in both text-dependent and text-independent ASV tasks. Conventional Gaussian Mixture Model (GMM), and the state-of-the-art x-vector are used to build ASV systems. Experimental results show that the voiceprint information in humming and singing speech are more distinguishable than that in normal reading speech for conventional ASV systems. Furthermore, we find that combing the three speaking styles can significantly improve the x-vector based ASV system, even when only limited gains are obtained by conventional GMM-based systems. Public Library of Science 2020-11-11 /pmc/articles/PMC7657545/ /pubmed/33175898 http://dx.doi.org/10.1371/journal.pone.0241809 Text en © 2020 Mao et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Mao, Hongwei
Shi, Yan
Liu, Yue
Wei, Linqiang
Li, Yijie
Long, Yanhua
Short-time speaker verification with different speaking style utterances
title Short-time speaker verification with different speaking style utterances
title_full Short-time speaker verification with different speaking style utterances
title_fullStr Short-time speaker verification with different speaking style utterances
title_full_unstemmed Short-time speaker verification with different speaking style utterances
title_short Short-time speaker verification with different speaking style utterances
title_sort short-time speaker verification with different speaking style utterances
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7657545/
https://www.ncbi.nlm.nih.gov/pubmed/33175898
http://dx.doi.org/10.1371/journal.pone.0241809
work_keys_str_mv AT maohongwei shorttimespeakerverificationwithdifferentspeakingstyleutterances
AT shiyan shorttimespeakerverificationwithdifferentspeakingstyleutterances
AT liuyue shorttimespeakerverificationwithdifferentspeakingstyleutterances
AT weilinqiang shorttimespeakerverificationwithdifferentspeakingstyleutterances
AT liyijie shorttimespeakerverificationwithdifferentspeakingstyleutterances
AT longyanhua shorttimespeakerverificationwithdifferentspeakingstyleutterances