Cargando…
CTTGAN: Traffic Data Synthesizing Scheme Based on Conditional GAN
Most machine learning algorithms only have a good recognition rate on balanced datasets. However, in the field of malicious traffic identification, benign traffic on the network is far greater than malicious traffic, and the network traffic dataset is imbalanced, which makes the algorithm have a low...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9322752/ https://www.ncbi.nlm.nih.gov/pubmed/35890921 http://dx.doi.org/10.3390/s22145243 |
_version_ | 1784756382388453376 |
---|---|
author | Wang, Jiayu Yan, Xuehu Liu, Lintao Li, Longlong Yu, Yongqiang |
author_facet | Wang, Jiayu Yan, Xuehu Liu, Lintao Li, Longlong Yu, Yongqiang |
author_sort | Wang, Jiayu |
collection | PubMed |
description | Most machine learning algorithms only have a good recognition rate on balanced datasets. However, in the field of malicious traffic identification, benign traffic on the network is far greater than malicious traffic, and the network traffic dataset is imbalanced, which makes the algorithm have a low identification rate for small categories of malicious traffic samples. This paper presents a traffic sample synthesizing model named Conditional Tabular Traffic Generative Adversarial Network (CTTGAN), which uses a Conditional Tabular Generative Adversarial Network (CTGAN) algorithm to expand the small category traffic samples and balance the dataset in order to improve the malicious traffic identification rate. The CTTGAN model expands and recognizes feature data, which meets the requirements of a machine learning algorithm for training and prediction data. The contributions of this paper are as follows: first, the small category samples are expanded and the traffic dataset is balanced; second, the storage cost and computational complexity are reduced compared to models using image data; third, discrete variables and continuous variables in traffic feature data are processed at the same time, and the data distribution is described well. The experimental results show that the recognition rate of the expanded samples is more than 0.99 in MLP, KNN and SVM algorithms. In addition, the recognition rate of the proposed CTTGAN model is better than the oversampling and undersampling schemes. |
format | Online Article Text |
id | pubmed-9322752 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-93227522022-07-27 CTTGAN: Traffic Data Synthesizing Scheme Based on Conditional GAN Wang, Jiayu Yan, Xuehu Liu, Lintao Li, Longlong Yu, Yongqiang Sensors (Basel) Article Most machine learning algorithms only have a good recognition rate on balanced datasets. However, in the field of malicious traffic identification, benign traffic on the network is far greater than malicious traffic, and the network traffic dataset is imbalanced, which makes the algorithm have a low identification rate for small categories of malicious traffic samples. This paper presents a traffic sample synthesizing model named Conditional Tabular Traffic Generative Adversarial Network (CTTGAN), which uses a Conditional Tabular Generative Adversarial Network (CTGAN) algorithm to expand the small category traffic samples and balance the dataset in order to improve the malicious traffic identification rate. The CTTGAN model expands and recognizes feature data, which meets the requirements of a machine learning algorithm for training and prediction data. The contributions of this paper are as follows: first, the small category samples are expanded and the traffic dataset is balanced; second, the storage cost and computational complexity are reduced compared to models using image data; third, discrete variables and continuous variables in traffic feature data are processed at the same time, and the data distribution is described well. The experimental results show that the recognition rate of the expanded samples is more than 0.99 in MLP, KNN and SVM algorithms. In addition, the recognition rate of the proposed CTTGAN model is better than the oversampling and undersampling schemes. MDPI 2022-07-13 /pmc/articles/PMC9322752/ /pubmed/35890921 http://dx.doi.org/10.3390/s22145243 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Wang, Jiayu Yan, Xuehu Liu, Lintao Li, Longlong Yu, Yongqiang CTTGAN: Traffic Data Synthesizing Scheme Based on Conditional GAN |
title | CTTGAN: Traffic Data Synthesizing Scheme Based on Conditional GAN |
title_full | CTTGAN: Traffic Data Synthesizing Scheme Based on Conditional GAN |
title_fullStr | CTTGAN: Traffic Data Synthesizing Scheme Based on Conditional GAN |
title_full_unstemmed | CTTGAN: Traffic Data Synthesizing Scheme Based on Conditional GAN |
title_short | CTTGAN: Traffic Data Synthesizing Scheme Based on Conditional GAN |
title_sort | cttgan: traffic data synthesizing scheme based on conditional gan |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9322752/ https://www.ncbi.nlm.nih.gov/pubmed/35890921 http://dx.doi.org/10.3390/s22145243 |
work_keys_str_mv | AT wangjiayu cttgantrafficdatasynthesizingschemebasedonconditionalgan AT yanxuehu cttgantrafficdatasynthesizingschemebasedonconditionalgan AT liulintao cttgantrafficdatasynthesizingschemebasedonconditionalgan AT lilonglong cttgantrafficdatasynthesizingschemebasedonconditionalgan AT yuyongqiang cttgantrafficdatasynthesizingschemebasedonconditionalgan |