Cargando…

Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification

Deep neural networks (DNNs) have been known to be vulnerable to adversarial attacks. Adversarial training (AT) is, so far, the only method that can guarantee the robustness of DNNs to adversarial attacks. However, the robustness generalization accuracy gain of AT is still far lower than the standard...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Desheng, Jin, Weidong, Wu, Yunpu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10057388/
https://www.ncbi.nlm.nih.gov/pubmed/36991962
http://dx.doi.org/10.3390/s23063252
_version_ 1785016353425457152
author Wang, Desheng
Jin, Weidong
Wu, Yunpu
author_facet Wang, Desheng
Jin, Weidong
Wu, Yunpu
author_sort Wang, Desheng
collection PubMed
description Deep neural networks (DNNs) have been known to be vulnerable to adversarial attacks. Adversarial training (AT) is, so far, the only method that can guarantee the robustness of DNNs to adversarial attacks. However, the robustness generalization accuracy gain of AT is still far lower than the standard generalization accuracy of an undefended model, and there is known to be a trade-off between the standard generalization accuracy and the robustness generalization accuracy of an adversarially trained model. In order to improve the robustness generalization and the standard generalization performance trade-off of AT, we propose a novel defense algorithm called Between-Class Adversarial Training (BCAT) that combines Between-Class learning (BC-learning) with standard AT. Specifically, BCAT mixes two adversarial examples from different classes and uses the mixed between-class adversarial examples to train a model instead of original adversarial examples during AT. We further propose BCAT+ which adopts a more powerful mixing method. BCAT and BCAT+ impose effective regularization on the feature distribution of adversarial examples to enlarge between-class distance, thus improving the robustness generalization and the standard generalization performance of AT. The proposed algorithms do not introduce any hyperparameters into standard AT; therefore, the process of hyperparameters searching can be avoided. We evaluate the proposed algorithms under both white-box attacks and black-box attacks using a spectrum of perturbation values on CIFAR-10, CIFAR-100, and SVHN datasets. The research findings indicate that our algorithms achieve better global robustness generalization performance than the state-of-the-art adversarial defense methods.
format Online
Article
Text
id pubmed-10057388
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-100573882023-03-30 Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification Wang, Desheng Jin, Weidong Wu, Yunpu Sensors (Basel) Article Deep neural networks (DNNs) have been known to be vulnerable to adversarial attacks. Adversarial training (AT) is, so far, the only method that can guarantee the robustness of DNNs to adversarial attacks. However, the robustness generalization accuracy gain of AT is still far lower than the standard generalization accuracy of an undefended model, and there is known to be a trade-off between the standard generalization accuracy and the robustness generalization accuracy of an adversarially trained model. In order to improve the robustness generalization and the standard generalization performance trade-off of AT, we propose a novel defense algorithm called Between-Class Adversarial Training (BCAT) that combines Between-Class learning (BC-learning) with standard AT. Specifically, BCAT mixes two adversarial examples from different classes and uses the mixed between-class adversarial examples to train a model instead of original adversarial examples during AT. We further propose BCAT+ which adopts a more powerful mixing method. BCAT and BCAT+ impose effective regularization on the feature distribution of adversarial examples to enlarge between-class distance, thus improving the robustness generalization and the standard generalization performance of AT. The proposed algorithms do not introduce any hyperparameters into standard AT; therefore, the process of hyperparameters searching can be avoided. We evaluate the proposed algorithms under both white-box attacks and black-box attacks using a spectrum of perturbation values on CIFAR-10, CIFAR-100, and SVHN datasets. The research findings indicate that our algorithms achieve better global robustness generalization performance than the state-of-the-art adversarial defense methods. MDPI 2023-03-20 /pmc/articles/PMC10057388/ /pubmed/36991962 http://dx.doi.org/10.3390/s23063252 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Wang, Desheng
Jin, Weidong
Wu, Yunpu
Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification
title Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification
title_full Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification
title_fullStr Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification
title_full_unstemmed Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification
title_short Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification
title_sort between-class adversarial training for improving adversarial robustness of image classification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10057388/
https://www.ncbi.nlm.nih.gov/pubmed/36991962
http://dx.doi.org/10.3390/s23063252
work_keys_str_mv AT wangdesheng betweenclassadversarialtrainingforimprovingadversarialrobustnessofimageclassification
AT jinweidong betweenclassadversarialtrainingforimprovingadversarialrobustnessofimageclassification
AT wuyunpu betweenclassadversarialtrainingforimprovingadversarialrobustnessofimageclassification