Cargando…

Enhancing the robustness of vision transformer defense against adversarial attacks based on squeeze-and-excitation module

Vision Transformer (ViT) models have achieved good results in computer vision tasks, their performance has been shown to exceed that of convolutional neural networks (CNNs). However, the robustness of the ViT model has been less studied recently. To address this problem, we investigate the robustnes...

Descripción completa

Detalles Bibliográficos
Autores principales: Chang, YouKang, Zhao, Hong, Wang, Weijie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280230/
https://www.ncbi.nlm.nih.gov/pubmed/37346601
http://dx.doi.org/10.7717/peerj-cs.1197
Descripción
Sumario:Vision Transformer (ViT) models have achieved good results in computer vision tasks, their performance has been shown to exceed that of convolutional neural networks (CNNs). However, the robustness of the ViT model has been less studied recently. To address this problem, we investigate the robustness of the ViT model in the face of adversarial attacks, and enhance the robustness of the model by introducing the ResNet- SE module, which acts on the Attention module of the ViT model. The Attention module not only learns edge and line information, but also can extract increasingly complex feature information; ResNet-SE module highlights the important information of each feature map and suppresses the minor information, which helps the model to perform the extraction of key features. The experimental results show that the accuracy of the proposed defense method is 19.812%, 17.083%, 18.802%, 21.490%, and 18.010% against Basic Iterative Method (BIM), C&W, DeepFool, DI(2)FGSM, and MDI(2)FGSM attacks, respectively. The defense method in this paper shows strong robustness compared with several other models.