Cargando…

Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification

Voice-controlled devices are in demand due to their hands-free controls. However, using voice-controlled devices in sensitive scenarios like smartphone applications and financial transactions requires protection against fraudulent attacks referred to as “speech spoofing”. The algorithms used in spoo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mewada, Hiren, Al-Asad, Jawad F., Almalki, Faris A., Khan, Adil H., Almujally, Nouf Abdullah, El-Nakla, Samir, Naith, Qamar
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10386291/ https://www.ncbi.nlm.nih.gov/pubmed/37514931 http://dx.doi.org/10.3390/s23146637

_version_	1785081627921088512
author	Mewada, Hiren Al-Asad, Jawad F. Almalki, Faris A. Khan, Adil H. Almujally, Nouf Abdullah El-Nakla, Samir Naith, Qamar
author_facet	Mewada, Hiren Al-Asad, Jawad F. Almalki, Faris A. Khan, Adil H. Almujally, Nouf Abdullah El-Nakla, Samir Naith, Qamar
author_sort	Mewada, Hiren
collection	PubMed
description	Voice-controlled devices are in demand due to their hands-free controls. However, using voice-controlled devices in sensitive scenarios like smartphone applications and financial transactions requires protection against fraudulent attacks referred to as “speech spoofing”. The algorithms used in spoof attacks are practically unknown; hence, further analysis and development of spoof-detection models for improving spoof classification are required. A study of the spoofed-speech spectrum suggests that high-frequency features are able to discriminate genuine speech from spoofed speech well. Typically, linear or triangular filter banks are used to obtain high-frequency features. However, a Gaussian filter can extract more global information than a triangular filter. In addition, MFCC features are preferable among other speech features because of their lower covariance. Therefore, in this study, the use of a Gaussian filter is proposed for the extraction of inverted MFCC (iMFCC) features, providing high-frequency features. Complementary features are integrated with iMFCC to strengthen the features that aid in the discrimination of spoof speech. Deep learning has been proven to be efficient in classification applications, but the selection of its hyper-parameters and architecture is crucial and directly affects performance. Therefore, a Bayesian algorithm is used to optimize the BiLSTM network. Thus, in this study, we build a high-frequency-based optimized BiLSTM network to classify the spoofed-speech signal, and we present an extensive investigation using the ASVSpoof 2017 dataset. The optimized BiLSTM model is successfully trained with the least epoch and achieved a 99.58% validation accuracy. The proposed algorithm achieved a 6.58% EER on the evaluation dataset, with a relative improvement of 78% on a baseline spoof-identification system.
format	Online Article Text
id	pubmed-10386291
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-103862912023-07-30 Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification Mewada, Hiren Al-Asad, Jawad F. Almalki, Faris A. Khan, Adil H. Almujally, Nouf Abdullah El-Nakla, Samir Naith, Qamar Sensors (Basel) Article Voice-controlled devices are in demand due to their hands-free controls. However, using voice-controlled devices in sensitive scenarios like smartphone applications and financial transactions requires protection against fraudulent attacks referred to as “speech spoofing”. The algorithms used in spoof attacks are practically unknown; hence, further analysis and development of spoof-detection models for improving spoof classification are required. A study of the spoofed-speech spectrum suggests that high-frequency features are able to discriminate genuine speech from spoofed speech well. Typically, linear or triangular filter banks are used to obtain high-frequency features. However, a Gaussian filter can extract more global information than a triangular filter. In addition, MFCC features are preferable among other speech features because of their lower covariance. Therefore, in this study, the use of a Gaussian filter is proposed for the extraction of inverted MFCC (iMFCC) features, providing high-frequency features. Complementary features are integrated with iMFCC to strengthen the features that aid in the discrimination of spoof speech. Deep learning has been proven to be efficient in classification applications, but the selection of its hyper-parameters and architecture is crucial and directly affects performance. Therefore, a Bayesian algorithm is used to optimize the BiLSTM network. Thus, in this study, we build a high-frequency-based optimized BiLSTM network to classify the spoofed-speech signal, and we present an extensive investigation using the ASVSpoof 2017 dataset. The optimized BiLSTM model is successfully trained with the least epoch and achieved a 99.58% validation accuracy. The proposed algorithm achieved a 6.58% EER on the evaluation dataset, with a relative improvement of 78% on a baseline spoof-identification system. MDPI 2023-07-24 /pmc/articles/PMC10386291/ /pubmed/37514931 http://dx.doi.org/10.3390/s23146637 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Mewada, Hiren Al-Asad, Jawad F. Almalki, Faris A. Khan, Adil H. Almujally, Nouf Abdullah El-Nakla, Samir Naith, Qamar Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification
title	Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification
title_full	Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification
title_fullStr	Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification
title_full_unstemmed	Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification
title_short	Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification
title_sort	gaussian-filtered high-frequency-feature trained optimized bilstm network for spoofed-speech classification
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10386291/ https://www.ncbi.nlm.nih.gov/pubmed/37514931 http://dx.doi.org/10.3390/s23146637
work_keys_str_mv	AT mewadahiren gaussianfilteredhighfrequencyfeaturetrainedoptimizedbilstmnetworkforspoofedspeechclassification AT alasadjawadf gaussianfilteredhighfrequencyfeaturetrainedoptimizedbilstmnetworkforspoofedspeechclassification AT almalkifarisa gaussianfilteredhighfrequencyfeaturetrainedoptimizedbilstmnetworkforspoofedspeechclassification AT khanadilh gaussianfilteredhighfrequencyfeaturetrainedoptimizedbilstmnetworkforspoofedspeechclassification AT almujallynoufabdullah gaussianfilteredhighfrequencyfeaturetrainedoptimizedbilstmnetworkforspoofedspeechclassification AT elnaklasamir gaussianfilteredhighfrequencyfeaturetrainedoptimizedbilstmnetworkforspoofedspeechclassification AT naithqamar gaussianfilteredhighfrequencyfeaturetrainedoptimizedbilstmnetworkforspoofedspeechclassification

Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification

Ejemplares similares