Cargando…

Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks

Due to the growing rise of cyber attacks in the Internet, the demand of accurate intrusion detection systems (IDS) to prevent these vulnerabilities is increasing. To this aim, Machine Learning (ML) components have been proposed as an efficient and effective solution. However, its applicability scope...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mozo, Alberto, González-Prieto, Ángel, Pastor, Antonio, Gómez-Canaval, Sandra, Talavera, Edgar
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8825844/ https://www.ncbi.nlm.nih.gov/pubmed/35136144 http://dx.doi.org/10.1038/s41598-022-06057-2

_version_	1784647320481038336
author	Mozo, Alberto González-Prieto, Ángel Pastor, Antonio Gómez-Canaval, Sandra Talavera, Edgar
author_facet	Mozo, Alberto González-Prieto, Ángel Pastor, Antonio Gómez-Canaval, Sandra Talavera, Edgar
author_sort	Mozo, Alberto
collection	PubMed
description	Due to the growing rise of cyber attacks in the Internet, the demand of accurate intrusion detection systems (IDS) to prevent these vulnerabilities is increasing. To this aim, Machine Learning (ML) components have been proposed as an efficient and effective solution. However, its applicability scope is limited by two important issues: (i) the shortage of network traffic data datasets for attack analysis, and (ii) the data privacy constraints of the data to be used. To overcome these problems, Generative Adversarial Networks (GANs) have been proposed for synthetic flow-based network traffic generation. However, due to the ill-convergence of the GAN training, none of the existing solutions can generate high-quality fully synthetic data that can totally substitute real data in the training of ML components. In contrast, they mix real with synthetic data, which acts only as data augmentation components, leading to privacy breaches as real data is used. In sharp contrast, in this work we propose a novel and deterministic way to measure the quality of the synthetic data produced by a GAN both with respect to the real data and to its performance when used for ML tasks. As a by-product, we present a heuristic that uses these metrics for selecting the best performing generator during GAN training, leading to a novel stopping criterion, which can be applied even when different types of synthetic data are to be used in the same ML task. We demonstrate the adequacy of our proposal by generating synthetic cryptomining attacks and normal traffic flow-based data using an enhanced version of a Wasserstein GAN. The results evidence that the generated synthetic network traffic can completely replace real data when training a ML-based cryptomining detector, obtaining similar performance and avoiding privacy violations, since real data is not used in the training of the ML-based detector.
format	Online Article Text
id	pubmed-8825844
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-88258442022-02-09 Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks Mozo, Alberto González-Prieto, Ángel Pastor, Antonio Gómez-Canaval, Sandra Talavera, Edgar Sci Rep Article Due to the growing rise of cyber attacks in the Internet, the demand of accurate intrusion detection systems (IDS) to prevent these vulnerabilities is increasing. To this aim, Machine Learning (ML) components have been proposed as an efficient and effective solution. However, its applicability scope is limited by two important issues: (i) the shortage of network traffic data datasets for attack analysis, and (ii) the data privacy constraints of the data to be used. To overcome these problems, Generative Adversarial Networks (GANs) have been proposed for synthetic flow-based network traffic generation. However, due to the ill-convergence of the GAN training, none of the existing solutions can generate high-quality fully synthetic data that can totally substitute real data in the training of ML components. In contrast, they mix real with synthetic data, which acts only as data augmentation components, leading to privacy breaches as real data is used. In sharp contrast, in this work we propose a novel and deterministic way to measure the quality of the synthetic data produced by a GAN both with respect to the real data and to its performance when used for ML tasks. As a by-product, we present a heuristic that uses these metrics for selecting the best performing generator during GAN training, leading to a novel stopping criterion, which can be applied even when different types of synthetic data are to be used in the same ML task. We demonstrate the adequacy of our proposal by generating synthetic cryptomining attacks and normal traffic flow-based data using an enhanced version of a Wasserstein GAN. The results evidence that the generated synthetic network traffic can completely replace real data when training a ML-based cryptomining detector, obtaining similar performance and avoiding privacy violations, since real data is not used in the training of the ML-based detector. Nature Publishing Group UK 2022-02-08 /pmc/articles/PMC8825844/ /pubmed/35136144 http://dx.doi.org/10.1038/s41598-022-06057-2 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Mozo, Alberto González-Prieto, Ángel Pastor, Antonio Gómez-Canaval, Sandra Talavera, Edgar Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks
title	Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks
title_full	Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks
title_fullStr	Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks
title_full_unstemmed	Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks
title_short	Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks
title_sort	synthetic flow-based cryptomining attack generation through generative adversarial networks
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8825844/ https://www.ncbi.nlm.nih.gov/pubmed/35136144 http://dx.doi.org/10.1038/s41598-022-06057-2
work_keys_str_mv	AT mozoalberto syntheticflowbasedcryptominingattackgenerationthroughgenerativeadversarialnetworks AT gonzalezprietoangel syntheticflowbasedcryptominingattackgenerationthroughgenerativeadversarialnetworks AT pastorantonio syntheticflowbasedcryptominingattackgenerationthroughgenerativeadversarialnetworks AT gomezcanavalsandra syntheticflowbasedcryptominingattackgenerationthroughgenerativeadversarialnetworks AT talaveraedgar syntheticflowbasedcryptominingattackgenerationthroughgenerativeadversarialnetworks

Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks

Ejemplares similares