Cargando…

Evaluation of Machine Learning Techniques for Traffic Flow-Based Intrusion Detection

Cybersecurity is one of the great challenges of today’s world. Rapid technological development has allowed society to prosper and improve the quality of life and the world is more dependent on new technologies. Managing security risks quickly and effectively, preventing, identifying, or mitigating t...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rodríguez, María, Alesanco, Álvaro, Mehavilla, Lorena, García, José
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9740321/ https://www.ncbi.nlm.nih.gov/pubmed/36502028 http://dx.doi.org/10.3390/s22239326

_version_	1784848032156614656
author	Rodríguez, María Alesanco, Álvaro Mehavilla, Lorena García, José
author_facet	Rodríguez, María Alesanco, Álvaro Mehavilla, Lorena García, José
author_sort	Rodríguez, María
collection	PubMed
description	Cybersecurity is one of the great challenges of today’s world. Rapid technological development has allowed society to prosper and improve the quality of life and the world is more dependent on new technologies. Managing security risks quickly and effectively, preventing, identifying, or mitigating them is a great challenge. The appearance of new attacks, and with more frequency, requires a constant update of threat detection methods. Traditional signature-based techniques are effective for known attacks, but they are not able to detect a new attack. For this reason, intrusion detection systems (IDS) that apply machine learning (ML) techniques represent an alternative that is gaining importance today. In this work, we have analyzed different machine learning techniques to determine which ones permit to obtain the best traffic classification results based on classification performance measurements and execution times, which is decisive for further real-time deployments. The CICIDS2017 dataset was selected in this work since it contains bidirectional traffic flows (derived from traffic captures) that include benign traffic and different types of up-to-date attacks. Each traffic flow is characterized by a set of connection-related attributes that can be used to model the traffic and distinguish between attacks and normal flows. The CICIDS2017 also contains the raw network traffic captures collected during the dataset creation in a packet-based format, thus permitting to extract the traffic flows from them. Various classification techniques have been evaluated using the Weka software: naive Bayes, logistic, multilayer perceptron, sequential minimal optimization, k-nearest neighbors, adaptive boosting, OneR, J48, PART, and random forest. As a general result, methods based on decision trees (PART, J48, and random forest) have turned out to be the most efficient with F1 values above 0.999 (average obtained in the complete dataset). Moreover, multiclass classification (distinguishing between different types of attack) and binary classification (distinguishing only between normal traffic and attack) have been compared, and the effect of reducing the number of attributes using the correlation-based feature selection (CFS) technique has been evaluated. By reducing the complexity in binary classification, better results can be obtained, and by selecting a reduced set of the most relevant attributes, less time is required (above 30% of decrease in the time required to test the model) at the cost of a small performance loss. The tree-based techniques with CFS attribute selection (six attributes selected) reached F1 values above 0.990 in the complete dataset. Finally, a conventional tool like Zeek has been used to process the raw traffic captures to identify the traffic flows and to obtain a reduced set of attributes from these flows. The classification results obtained using tree-based techniques (with 14 Zeek-based attributes) were also very high, with F1 above 0.997 (average obtained in the complete dataset) and low execution times (allowing several hundred thousand flows/s to be processed). These classification results obtained on the CICIDS2017 dataset allow us to affirm that the tree-based machine learning techniques may be appropriate in the flow-based intrusion detection problem and that algorithms, such as PART or J48, may offer a faster alternative solution to the RF technique.
format	Online Article Text
id	pubmed-9740321
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-97403212022-12-11 Evaluation of Machine Learning Techniques for Traffic Flow-Based Intrusion Detection Rodríguez, María Alesanco, Álvaro Mehavilla, Lorena García, José Sensors (Basel) Article Cybersecurity is one of the great challenges of today’s world. Rapid technological development has allowed society to prosper and improve the quality of life and the world is more dependent on new technologies. Managing security risks quickly and effectively, preventing, identifying, or mitigating them is a great challenge. The appearance of new attacks, and with more frequency, requires a constant update of threat detection methods. Traditional signature-based techniques are effective for known attacks, but they are not able to detect a new attack. For this reason, intrusion detection systems (IDS) that apply machine learning (ML) techniques represent an alternative that is gaining importance today. In this work, we have analyzed different machine learning techniques to determine which ones permit to obtain the best traffic classification results based on classification performance measurements and execution times, which is decisive for further real-time deployments. The CICIDS2017 dataset was selected in this work since it contains bidirectional traffic flows (derived from traffic captures) that include benign traffic and different types of up-to-date attacks. Each traffic flow is characterized by a set of connection-related attributes that can be used to model the traffic and distinguish between attacks and normal flows. The CICIDS2017 also contains the raw network traffic captures collected during the dataset creation in a packet-based format, thus permitting to extract the traffic flows from them. Various classification techniques have been evaluated using the Weka software: naive Bayes, logistic, multilayer perceptron, sequential minimal optimization, k-nearest neighbors, adaptive boosting, OneR, J48, PART, and random forest. As a general result, methods based on decision trees (PART, J48, and random forest) have turned out to be the most efficient with F1 values above 0.999 (average obtained in the complete dataset). Moreover, multiclass classification (distinguishing between different types of attack) and binary classification (distinguishing only between normal traffic and attack) have been compared, and the effect of reducing the number of attributes using the correlation-based feature selection (CFS) technique has been evaluated. By reducing the complexity in binary classification, better results can be obtained, and by selecting a reduced set of the most relevant attributes, less time is required (above 30% of decrease in the time required to test the model) at the cost of a small performance loss. The tree-based techniques with CFS attribute selection (six attributes selected) reached F1 values above 0.990 in the complete dataset. Finally, a conventional tool like Zeek has been used to process the raw traffic captures to identify the traffic flows and to obtain a reduced set of attributes from these flows. The classification results obtained using tree-based techniques (with 14 Zeek-based attributes) were also very high, with F1 above 0.997 (average obtained in the complete dataset) and low execution times (allowing several hundred thousand flows/s to be processed). These classification results obtained on the CICIDS2017 dataset allow us to affirm that the tree-based machine learning techniques may be appropriate in the flow-based intrusion detection problem and that algorithms, such as PART or J48, may offer a faster alternative solution to the RF technique. MDPI 2022-11-30 /pmc/articles/PMC9740321/ /pubmed/36502028 http://dx.doi.org/10.3390/s22239326 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Rodríguez, María Alesanco, Álvaro Mehavilla, Lorena García, José Evaluation of Machine Learning Techniques for Traffic Flow-Based Intrusion Detection
title	Evaluation of Machine Learning Techniques for Traffic Flow-Based Intrusion Detection
title_full	Evaluation of Machine Learning Techniques for Traffic Flow-Based Intrusion Detection
title_fullStr	Evaluation of Machine Learning Techniques for Traffic Flow-Based Intrusion Detection
title_full_unstemmed	Evaluation of Machine Learning Techniques for Traffic Flow-Based Intrusion Detection
title_short	Evaluation of Machine Learning Techniques for Traffic Flow-Based Intrusion Detection
title_sort	evaluation of machine learning techniques for traffic flow-based intrusion detection
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9740321/ https://www.ncbi.nlm.nih.gov/pubmed/36502028 http://dx.doi.org/10.3390/s22239326
work_keys_str_mv	AT rodriguezmaria evaluationofmachinelearningtechniquesfortrafficflowbasedintrusiondetection AT alesancoalvaro evaluationofmachinelearningtechniquesfortrafficflowbasedintrusiondetection AT mehavillalorena evaluationofmachinelearningtechniquesfortrafficflowbasedintrusiondetection AT garciajose evaluationofmachinelearningtechniquesfortrafficflowbasedintrusiondetection

Evaluation of Machine Learning Techniques for Traffic Flow-Based Intrusion Detection

Ejemplares similares