Cargando…

On the Impact of Network Data Balancing in Cybersecurity Applications

Machine learning methods are now widely used to detect a wide range of cyberattacks. Nevertheless, the commonly used algorithms come with challenges of their own - one of them lies in network dataset characteristics. The dataset should be well-balanced in terms of the number of malicious data sample...

Descripción completa

Detalles Bibliográficos
Autores principales: Pawlicki, Marek, Choraś, Michał, Kozik, Rafał, Hołubowicz, Witold
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303680/
http://dx.doi.org/10.1007/978-3-030-50423-6_15
_version_ 1783548111662088192
author Pawlicki, Marek
Choraś, Michał
Kozik, Rafał
Hołubowicz, Witold
author_facet Pawlicki, Marek
Choraś, Michał
Kozik, Rafał
Hołubowicz, Witold
author_sort Pawlicki, Marek
collection PubMed
description Machine learning methods are now widely used to detect a wide range of cyberattacks. Nevertheless, the commonly used algorithms come with challenges of their own - one of them lies in network dataset characteristics. The dataset should be well-balanced in terms of the number of malicious data samples vs. benign traffic samples to achieve adequate results. When the data is not balanced, numerous machine learning approaches show a tendency to classify minority class samples as majority class samples. Since usually in network traffic data there are significantly fewer malicious samples than benign samples, in this work the problem of learning from imbalanced network traffic data in the cybersecurity domain is addressed. A number of balancing approaches is evaluated along with their impact on different machine learning algorithms.
format Online
Article
Text
id pubmed-7303680
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-73036802020-06-19 On the Impact of Network Data Balancing in Cybersecurity Applications Pawlicki, Marek Choraś, Michał Kozik, Rafał Hołubowicz, Witold Computational Science – ICCS 2020 Article Machine learning methods are now widely used to detect a wide range of cyberattacks. Nevertheless, the commonly used algorithms come with challenges of their own - one of them lies in network dataset characteristics. The dataset should be well-balanced in terms of the number of malicious data samples vs. benign traffic samples to achieve adequate results. When the data is not balanced, numerous machine learning approaches show a tendency to classify minority class samples as majority class samples. Since usually in network traffic data there are significantly fewer malicious samples than benign samples, in this work the problem of learning from imbalanced network traffic data in the cybersecurity domain is addressed. A number of balancing approaches is evaluated along with their impact on different machine learning algorithms. 2020-05-23 /pmc/articles/PMC7303680/ http://dx.doi.org/10.1007/978-3-030-50423-6_15 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Pawlicki, Marek
Choraś, Michał
Kozik, Rafał
Hołubowicz, Witold
On the Impact of Network Data Balancing in Cybersecurity Applications
title On the Impact of Network Data Balancing in Cybersecurity Applications
title_full On the Impact of Network Data Balancing in Cybersecurity Applications
title_fullStr On the Impact of Network Data Balancing in Cybersecurity Applications
title_full_unstemmed On the Impact of Network Data Balancing in Cybersecurity Applications
title_short On the Impact of Network Data Balancing in Cybersecurity Applications
title_sort on the impact of network data balancing in cybersecurity applications
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303680/
http://dx.doi.org/10.1007/978-3-030-50423-6_15
work_keys_str_mv AT pawlickimarek ontheimpactofnetworkdatabalancingincybersecurityapplications
AT chorasmichał ontheimpactofnetworkdatabalancingincybersecurityapplications
AT kozikrafał ontheimpactofnetworkdatabalancingincybersecurityapplications
AT hołubowiczwitold ontheimpactofnetworkdatabalancingincybersecurityapplications