Cargando…

CTTGAN: Traffic Data Synthesizing Scheme Based on Conditional GAN

Most machine learning algorithms only have a good recognition rate on balanced datasets. However, in the field of malicious traffic identification, benign traffic on the network is far greater than malicious traffic, and the network traffic dataset is imbalanced, which makes the algorithm have a low...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Jiayu, Yan, Xuehu, Liu, Lintao, Li, Longlong, Yu, Yongqiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9322752/
https://www.ncbi.nlm.nih.gov/pubmed/35890921
http://dx.doi.org/10.3390/s22145243
_version_ 1784756382388453376
author Wang, Jiayu
Yan, Xuehu
Liu, Lintao
Li, Longlong
Yu, Yongqiang
author_facet Wang, Jiayu
Yan, Xuehu
Liu, Lintao
Li, Longlong
Yu, Yongqiang
author_sort Wang, Jiayu
collection PubMed
description Most machine learning algorithms only have a good recognition rate on balanced datasets. However, in the field of malicious traffic identification, benign traffic on the network is far greater than malicious traffic, and the network traffic dataset is imbalanced, which makes the algorithm have a low identification rate for small categories of malicious traffic samples. This paper presents a traffic sample synthesizing model named Conditional Tabular Traffic Generative Adversarial Network (CTTGAN), which uses a Conditional Tabular Generative Adversarial Network (CTGAN) algorithm to expand the small category traffic samples and balance the dataset in order to improve the malicious traffic identification rate. The CTTGAN model expands and recognizes feature data, which meets the requirements of a machine learning algorithm for training and prediction data. The contributions of this paper are as follows: first, the small category samples are expanded and the traffic dataset is balanced; second, the storage cost and computational complexity are reduced compared to models using image data; third, discrete variables and continuous variables in traffic feature data are processed at the same time, and the data distribution is described well. The experimental results show that the recognition rate of the expanded samples is more than 0.99 in MLP, KNN and SVM algorithms. In addition, the recognition rate of the proposed CTTGAN model is better than the oversampling and undersampling schemes.
format Online
Article
Text
id pubmed-9322752
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-93227522022-07-27 CTTGAN: Traffic Data Synthesizing Scheme Based on Conditional GAN Wang, Jiayu Yan, Xuehu Liu, Lintao Li, Longlong Yu, Yongqiang Sensors (Basel) Article Most machine learning algorithms only have a good recognition rate on balanced datasets. However, in the field of malicious traffic identification, benign traffic on the network is far greater than malicious traffic, and the network traffic dataset is imbalanced, which makes the algorithm have a low identification rate for small categories of malicious traffic samples. This paper presents a traffic sample synthesizing model named Conditional Tabular Traffic Generative Adversarial Network (CTTGAN), which uses a Conditional Tabular Generative Adversarial Network (CTGAN) algorithm to expand the small category traffic samples and balance the dataset in order to improve the malicious traffic identification rate. The CTTGAN model expands and recognizes feature data, which meets the requirements of a machine learning algorithm for training and prediction data. The contributions of this paper are as follows: first, the small category samples are expanded and the traffic dataset is balanced; second, the storage cost and computational complexity are reduced compared to models using image data; third, discrete variables and continuous variables in traffic feature data are processed at the same time, and the data distribution is described well. The experimental results show that the recognition rate of the expanded samples is more than 0.99 in MLP, KNN and SVM algorithms. In addition, the recognition rate of the proposed CTTGAN model is better than the oversampling and undersampling schemes. MDPI 2022-07-13 /pmc/articles/PMC9322752/ /pubmed/35890921 http://dx.doi.org/10.3390/s22145243 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Wang, Jiayu
Yan, Xuehu
Liu, Lintao
Li, Longlong
Yu, Yongqiang
CTTGAN: Traffic Data Synthesizing Scheme Based on Conditional GAN
title CTTGAN: Traffic Data Synthesizing Scheme Based on Conditional GAN
title_full CTTGAN: Traffic Data Synthesizing Scheme Based on Conditional GAN
title_fullStr CTTGAN: Traffic Data Synthesizing Scheme Based on Conditional GAN
title_full_unstemmed CTTGAN: Traffic Data Synthesizing Scheme Based on Conditional GAN
title_short CTTGAN: Traffic Data Synthesizing Scheme Based on Conditional GAN
title_sort cttgan: traffic data synthesizing scheme based on conditional gan
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9322752/
https://www.ncbi.nlm.nih.gov/pubmed/35890921
http://dx.doi.org/10.3390/s22145243
work_keys_str_mv AT wangjiayu cttgantrafficdatasynthesizingschemebasedonconditionalgan
AT yanxuehu cttgantrafficdatasynthesizingschemebasedonconditionalgan
AT liulintao cttgantrafficdatasynthesizingschemebasedonconditionalgan
AT lilonglong cttgantrafficdatasynthesizingschemebasedonconditionalgan
AT yuyongqiang cttgantrafficdatasynthesizingschemebasedonconditionalgan