Cargando…

A Pre-Training Framework Based on Multi-Order Acoustic Simulation for Replay Voice Spoofing Detection

Voice spoofing attempts to break into a specific automatic speaker verification (ASV) system by forging the user’s voice and can be used through methods such as text-to-speech (TTS), voice conversion (VC), and replay attacks. Recently, deep learning-based voice spoofing countermeasures have been dev...

Descripción completa

Detalles Bibliográficos
Autores principales:	Go, Changhwan, Park, Nam In, Jeon, Oc-Yeub, Chun, Chanjun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Communication
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10458210/ https://www.ncbi.nlm.nih.gov/pubmed/37631815 http://dx.doi.org/10.3390/s23167280

_version_	1785097112805965824
author	Go, Changhwan Park, Nam In Jeon, Oc-Yeub Chun, Chanjun
author_facet	Go, Changhwan Park, Nam In Jeon, Oc-Yeub Chun, Chanjun
author_sort	Go, Changhwan
collection	PubMed
description	Voice spoofing attempts to break into a specific automatic speaker verification (ASV) system by forging the user’s voice and can be used through methods such as text-to-speech (TTS), voice conversion (VC), and replay attacks. Recently, deep learning-based voice spoofing countermeasures have been developed. However, the problem with replay is that it is difficult to construct a large number of datasets because it requires a physical recording process. To overcome these problems, this study proposes a pre-training framework based on multi-order acoustic simulation for replay voice spoofing detection. Multi-order acoustic simulation utilizes existing clean signal and room impulse response (RIR) datasets to generate audios, which simulate the various acoustic configurations of the original and replayed audios. The acoustic configuration refers to factors such as the microphone type, reverberation, time delay, and noise that may occur between a speaker and microphone during the recording process. We assume that a deep learning model trained on an audio that simulates the various acoustic configurations of the original and replayed audios can classify the acoustic configurations of the original and replay audios well. To validate this, we performed pre-training to classify the audio generated by the multi-order acoustic simulation into three classes: clean signal, audio simulating the acoustic configuration of the original audio, and audio simulating the acoustic configuration of the replay audio. We also set the weights of the pre-training model to the initial weights of the replay voice spoofing detection model using the existing replay voice spoofing dataset and then performed fine-tuning. To validate the effectiveness of the proposed method, we evaluated the performance of the conventional method without pre-training and proposed method using an objective metric, i.e., the accuracy and F1-score. As a result, the conventional method achieved an accuracy of 92.94%, F1-score of 86.92% and the proposed method achieved an accuracy of 98.16%, F1-score of 95.08%.
format	Online Article Text
id	pubmed-10458210
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-104582102023-08-27 A Pre-Training Framework Based on Multi-Order Acoustic Simulation for Replay Voice Spoofing Detection Go, Changhwan Park, Nam In Jeon, Oc-Yeub Chun, Chanjun Sensors (Basel) Communication Voice spoofing attempts to break into a specific automatic speaker verification (ASV) system by forging the user’s voice and can be used through methods such as text-to-speech (TTS), voice conversion (VC), and replay attacks. Recently, deep learning-based voice spoofing countermeasures have been developed. However, the problem with replay is that it is difficult to construct a large number of datasets because it requires a physical recording process. To overcome these problems, this study proposes a pre-training framework based on multi-order acoustic simulation for replay voice spoofing detection. Multi-order acoustic simulation utilizes existing clean signal and room impulse response (RIR) datasets to generate audios, which simulate the various acoustic configurations of the original and replayed audios. The acoustic configuration refers to factors such as the microphone type, reverberation, time delay, and noise that may occur between a speaker and microphone during the recording process. We assume that a deep learning model trained on an audio that simulates the various acoustic configurations of the original and replayed audios can classify the acoustic configurations of the original and replay audios well. To validate this, we performed pre-training to classify the audio generated by the multi-order acoustic simulation into three classes: clean signal, audio simulating the acoustic configuration of the original audio, and audio simulating the acoustic configuration of the replay audio. We also set the weights of the pre-training model to the initial weights of the replay voice spoofing detection model using the existing replay voice spoofing dataset and then performed fine-tuning. To validate the effectiveness of the proposed method, we evaluated the performance of the conventional method without pre-training and proposed method using an objective metric, i.e., the accuracy and F1-score. As a result, the conventional method achieved an accuracy of 92.94%, F1-score of 86.92% and the proposed method achieved an accuracy of 98.16%, F1-score of 95.08%. MDPI 2023-08-20 /pmc/articles/PMC10458210/ /pubmed/37631815 http://dx.doi.org/10.3390/s23167280 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Communication Go, Changhwan Park, Nam In Jeon, Oc-Yeub Chun, Chanjun A Pre-Training Framework Based on Multi-Order Acoustic Simulation for Replay Voice Spoofing Detection
title	A Pre-Training Framework Based on Multi-Order Acoustic Simulation for Replay Voice Spoofing Detection
title_full	A Pre-Training Framework Based on Multi-Order Acoustic Simulation for Replay Voice Spoofing Detection
title_fullStr	A Pre-Training Framework Based on Multi-Order Acoustic Simulation for Replay Voice Spoofing Detection
title_full_unstemmed	A Pre-Training Framework Based on Multi-Order Acoustic Simulation for Replay Voice Spoofing Detection
title_short	A Pre-Training Framework Based on Multi-Order Acoustic Simulation for Replay Voice Spoofing Detection
title_sort	pre-training framework based on multi-order acoustic simulation for replay voice spoofing detection
topic	Communication
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10458210/ https://www.ncbi.nlm.nih.gov/pubmed/37631815 http://dx.doi.org/10.3390/s23167280
work_keys_str_mv	AT gochanghwan apretrainingframeworkbasedonmultiorderacousticsimulationforreplayvoicespoofingdetection AT parknamin apretrainingframeworkbasedonmultiorderacousticsimulationforreplayvoicespoofingdetection AT jeonocyeub apretrainingframeworkbasedonmultiorderacousticsimulationforreplayvoicespoofingdetection AT chunchanjun apretrainingframeworkbasedonmultiorderacousticsimulationforreplayvoicespoofingdetection AT gochanghwan pretrainingframeworkbasedonmultiorderacousticsimulationforreplayvoicespoofingdetection AT parknamin pretrainingframeworkbasedonmultiorderacousticsimulationforreplayvoicespoofingdetection AT jeonocyeub pretrainingframeworkbasedonmultiorderacousticsimulationforreplayvoicespoofingdetection AT chunchanjun pretrainingframeworkbasedonmultiorderacousticsimulationforreplayvoicespoofingdetection

A Pre-Training Framework Based on Multi-Order Acoustic Simulation for Replay Voice Spoofing Detection

Ejemplares similares