Cargando…

Presentation Attack Detection on Limited-Resource Devices Using Deep Neural Classifiers Trained on Consistent Spectrogram Fragments

The presented paper is concerned with detection of presentation attacks against unsupervised remote biometric speaker verification, using a well-known challenge–response scheme. We propose a novel approach to convolutional phoneme classifier training, which ensures high phoneme recognition accuracy...

Descripción completa

Detalles Bibliográficos
Autores principales: Kubicki, Kacper, Kapusta, Paweł, Ślot, Krzysztof
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8624699/
https://www.ncbi.nlm.nih.gov/pubmed/34833803
http://dx.doi.org/10.3390/s21227728
_version_ 1784606238191910912
author Kubicki, Kacper
Kapusta, Paweł
Ślot, Krzysztof
author_facet Kubicki, Kacper
Kapusta, Paweł
Ślot, Krzysztof
author_sort Kubicki, Kacper
collection PubMed
description The presented paper is concerned with detection of presentation attacks against unsupervised remote biometric speaker verification, using a well-known challenge–response scheme. We propose a novel approach to convolutional phoneme classifier training, which ensures high phoneme recognition accuracy even for significantly simplified network architectures, thus enabling efficient utterance verification on resource-limited hardware, such as mobile phones or embedded devices. We consider Deep Convolutional Neural Networks operating on windows of speech Mel-Spectrograms as a means for phoneme recognition, and we show that one can boost the performance of highly simplified neural architectures by modifying the principle underlying training set construction. Instead of generating training examples by slicing spectrograms using a sliding window, as it is commonly done, we propose to maximize the consistency of phoneme-related spectrogram structures that are to be learned, by choosing only spectrogram chunks from the central regions of phoneme articulation intervals. This approach enables better utilization of the limited capacity of the considered simplified networks, as it significantly reduces a within-class data scatter. We show that neural architectures comprising as few as dozens of thousands parameters can successfully—with accuracy of up to 76%, solve the 39-phoneme recognition task (we use the English language TIMIT database for experimental verification of the method). We also show that ensembling of simple classifiers, using a basic bagging method, boosts the recognition accuracy by another 2–3%, offering Phoneme Error Rates at the level of 23%, which approaches the accuracy of the state-of-the-art deep neural architectures that are one to two orders of magnitude more complex than the proposed solution. This, in turn, enables executing reliable presentation attack detection, based on just few-syllable long challenges on highly resource-limited computing hardware.
format Online
Article
Text
id pubmed-8624699
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-86246992021-11-27 Presentation Attack Detection on Limited-Resource Devices Using Deep Neural Classifiers Trained on Consistent Spectrogram Fragments Kubicki, Kacper Kapusta, Paweł Ślot, Krzysztof Sensors (Basel) Article The presented paper is concerned with detection of presentation attacks against unsupervised remote biometric speaker verification, using a well-known challenge–response scheme. We propose a novel approach to convolutional phoneme classifier training, which ensures high phoneme recognition accuracy even for significantly simplified network architectures, thus enabling efficient utterance verification on resource-limited hardware, such as mobile phones or embedded devices. We consider Deep Convolutional Neural Networks operating on windows of speech Mel-Spectrograms as a means for phoneme recognition, and we show that one can boost the performance of highly simplified neural architectures by modifying the principle underlying training set construction. Instead of generating training examples by slicing spectrograms using a sliding window, as it is commonly done, we propose to maximize the consistency of phoneme-related spectrogram structures that are to be learned, by choosing only spectrogram chunks from the central regions of phoneme articulation intervals. This approach enables better utilization of the limited capacity of the considered simplified networks, as it significantly reduces a within-class data scatter. We show that neural architectures comprising as few as dozens of thousands parameters can successfully—with accuracy of up to 76%, solve the 39-phoneme recognition task (we use the English language TIMIT database for experimental verification of the method). We also show that ensembling of simple classifiers, using a basic bagging method, boosts the recognition accuracy by another 2–3%, offering Phoneme Error Rates at the level of 23%, which approaches the accuracy of the state-of-the-art deep neural architectures that are one to two orders of magnitude more complex than the proposed solution. This, in turn, enables executing reliable presentation attack detection, based on just few-syllable long challenges on highly resource-limited computing hardware. MDPI 2021-11-20 /pmc/articles/PMC8624699/ /pubmed/34833803 http://dx.doi.org/10.3390/s21227728 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kubicki, Kacper
Kapusta, Paweł
Ślot, Krzysztof
Presentation Attack Detection on Limited-Resource Devices Using Deep Neural Classifiers Trained on Consistent Spectrogram Fragments
title Presentation Attack Detection on Limited-Resource Devices Using Deep Neural Classifiers Trained on Consistent Spectrogram Fragments
title_full Presentation Attack Detection on Limited-Resource Devices Using Deep Neural Classifiers Trained on Consistent Spectrogram Fragments
title_fullStr Presentation Attack Detection on Limited-Resource Devices Using Deep Neural Classifiers Trained on Consistent Spectrogram Fragments
title_full_unstemmed Presentation Attack Detection on Limited-Resource Devices Using Deep Neural Classifiers Trained on Consistent Spectrogram Fragments
title_short Presentation Attack Detection on Limited-Resource Devices Using Deep Neural Classifiers Trained on Consistent Spectrogram Fragments
title_sort presentation attack detection on limited-resource devices using deep neural classifiers trained on consistent spectrogram fragments
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8624699/
https://www.ncbi.nlm.nih.gov/pubmed/34833803
http://dx.doi.org/10.3390/s21227728
work_keys_str_mv AT kubickikacper presentationattackdetectiononlimitedresourcedevicesusingdeepneuralclassifierstrainedonconsistentspectrogramfragments
AT kapustapaweł presentationattackdetectiononlimitedresourcedevicesusingdeepneuralclassifierstrainedonconsistentspectrogramfragments
AT slotkrzysztof presentationattackdetectiononlimitedresourcedevicesusingdeepneuralclassifierstrainedonconsistentspectrogramfragments