Cargando…

Transferability of features for neural networks links to adversarial attacks and defences

The reason for the existence of adversarial samples is still barely understood. Here, we explore the transferability of learned features to Out-of-Distribution (OoD) classes. We do this by assessing neural networks’ capability to encode the existing features, revealing an intriguing connection with...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kotyan, Shashank, Matsuki, Moe, Vargas, Danilo Vasconcellos
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9045664/ https://www.ncbi.nlm.nih.gov/pubmed/35476838 http://dx.doi.org/10.1371/journal.pone.0266060

_version_	1784695366178832384
author	Kotyan, Shashank Matsuki, Moe Vargas, Danilo Vasconcellos
author_facet	Kotyan, Shashank Matsuki, Moe Vargas, Danilo Vasconcellos
author_sort	Kotyan, Shashank
collection	PubMed
description	The reason for the existence of adversarial samples is still barely understood. Here, we explore the transferability of learned features to Out-of-Distribution (OoD) classes. We do this by assessing neural networks’ capability to encode the existing features, revealing an intriguing connection with adversarial attacks and defences. The principal idea is that, “if an algorithm learns rich features, such features should represent Out-of-Distribution classes as a combination of previously learned In-Distribution (ID) classes”. This is because OoD classes usually share several regular features with ID classes, given that the features learned are general enough. We further introduce two metrics to assess the transferred features representing OoD classes. One is based on inter-cluster validation techniques, while the other captures the influence of a class over learned features. Experiments suggest that several adversarial defences decrease the attack accuracy of some attacks and improve the transferability-of-features as measured by our metrics. Experiments also reveal a relationship between the proposed metrics and adversarial attacks (a high Pearson correlation coefficient and low p-value). Further, statistical tests suggest that several adversarial defences, in general, significantly improve transferability. Our tests suggests that models having a higher transferability-of-features have generally higher robustness against adversarial attacks. Thus, the experiments suggest that the objectives of adversarial machine learning might be much closer to domain transfer learning, as previously thought.
format	Online Article Text
id	pubmed-9045664
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-90456642022-04-28 Transferability of features for neural networks links to adversarial attacks and defences Kotyan, Shashank Matsuki, Moe Vargas, Danilo Vasconcellos PLoS One Research Article The reason for the existence of adversarial samples is still barely understood. Here, we explore the transferability of learned features to Out-of-Distribution (OoD) classes. We do this by assessing neural networks’ capability to encode the existing features, revealing an intriguing connection with adversarial attacks and defences. The principal idea is that, “if an algorithm learns rich features, such features should represent Out-of-Distribution classes as a combination of previously learned In-Distribution (ID) classes”. This is because OoD classes usually share several regular features with ID classes, given that the features learned are general enough. We further introduce two metrics to assess the transferred features representing OoD classes. One is based on inter-cluster validation techniques, while the other captures the influence of a class over learned features. Experiments suggest that several adversarial defences decrease the attack accuracy of some attacks and improve the transferability-of-features as measured by our metrics. Experiments also reveal a relationship between the proposed metrics and adversarial attacks (a high Pearson correlation coefficient and low p-value). Further, statistical tests suggest that several adversarial defences, in general, significantly improve transferability. Our tests suggests that models having a higher transferability-of-features have generally higher robustness against adversarial attacks. Thus, the experiments suggest that the objectives of adversarial machine learning might be much closer to domain transfer learning, as previously thought. Public Library of Science 2022-04-27 /pmc/articles/PMC9045664/ /pubmed/35476838 http://dx.doi.org/10.1371/journal.pone.0266060 Text en © 2022 Kotyan et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Kotyan, Shashank Matsuki, Moe Vargas, Danilo Vasconcellos Transferability of features for neural networks links to adversarial attacks and defences
title	Transferability of features for neural networks links to adversarial attacks and defences
title_full	Transferability of features for neural networks links to adversarial attacks and defences
title_fullStr	Transferability of features for neural networks links to adversarial attacks and defences
title_full_unstemmed	Transferability of features for neural networks links to adversarial attacks and defences
title_short	Transferability of features for neural networks links to adversarial attacks and defences
title_sort	transferability of features for neural networks links to adversarial attacks and defences
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9045664/ https://www.ncbi.nlm.nih.gov/pubmed/35476838 http://dx.doi.org/10.1371/journal.pone.0266060
work_keys_str_mv	AT kotyanshashank transferabilityoffeaturesforneuralnetworkslinkstoadversarialattacksanddefences AT matsukimoe transferabilityoffeaturesforneuralnetworkslinkstoadversarialattacksanddefences AT vargasdanilovasconcellos transferabilityoffeaturesforneuralnetworkslinkstoadversarialattacksanddefences

Transferability of features for neural networks links to adversarial attacks and defences

Ejemplares similares