Cargando…

How to Effectively Collect and Process Network Data for Intrusion Detection?

The number of security breaches in the cyberspace is on the rise. This threat is met with intensive work in the intrusion detection research community. To keep the defensive mechanisms up to date and relevant, realistic network traffic datasets are needed. The use of flow-based data for machine-lear...

Descripción completa

Detalles Bibliográficos
Autores principales:	Komisarek, Mikołaj, Pawlicki, Marek, Kozik, Rafał, Hołubowicz, Witold, Choraś, Michał
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8619486/ https://www.ncbi.nlm.nih.gov/pubmed/34828230 http://dx.doi.org/10.3390/e23111532

_version_	1784605004010618880
author	Komisarek, Mikołaj Pawlicki, Marek Kozik, Rafał Hołubowicz, Witold Choraś, Michał
author_facet	Komisarek, Mikołaj Pawlicki, Marek Kozik, Rafał Hołubowicz, Witold Choraś, Michał
author_sort	Komisarek, Mikołaj
collection	PubMed
description	The number of security breaches in the cyberspace is on the rise. This threat is met with intensive work in the intrusion detection research community. To keep the defensive mechanisms up to date and relevant, realistic network traffic datasets are needed. The use of flow-based data for machine-learning-based network intrusion detection is a promising direction for intrusion detection systems. However, many contemporary benchmark datasets do not contain features that are usable in the wild. The main contribution of this work is to cover the research gap related to identifying and investigating valuable features in the NetFlow schema that allow for effective, machine-learning-based network intrusion detection in the real world. To achieve this goal, several feature selection techniques have been applied on five flow-based network intrusion detection datasets, establishing an informative flow-based feature set. The authors’ experience with the deployment of this kind of system shows that to close the research-to-market gap, and to perform actual real-world application of machine-learning-based intrusion detection, a set of labeled data from the end-user has to be collected. This research aims at establishing the appropriate, minimal amount of data that is sufficient to effectively train machine learning algorithms in intrusion detection. The results show that a set of 10 features and a small amount of data is enough for the final model to perform very well.
format	Online Article Text
id	pubmed-8619486
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-86194862021-11-27 How to Effectively Collect and Process Network Data for Intrusion Detection? Komisarek, Mikołaj Pawlicki, Marek Kozik, Rafał Hołubowicz, Witold Choraś, Michał Entropy (Basel) Article The number of security breaches in the cyberspace is on the rise. This threat is met with intensive work in the intrusion detection research community. To keep the defensive mechanisms up to date and relevant, realistic network traffic datasets are needed. The use of flow-based data for machine-learning-based network intrusion detection is a promising direction for intrusion detection systems. However, many contemporary benchmark datasets do not contain features that are usable in the wild. The main contribution of this work is to cover the research gap related to identifying and investigating valuable features in the NetFlow schema that allow for effective, machine-learning-based network intrusion detection in the real world. To achieve this goal, several feature selection techniques have been applied on five flow-based network intrusion detection datasets, establishing an informative flow-based feature set. The authors’ experience with the deployment of this kind of system shows that to close the research-to-market gap, and to perform actual real-world application of machine-learning-based intrusion detection, a set of labeled data from the end-user has to be collected. This research aims at establishing the appropriate, minimal amount of data that is sufficient to effectively train machine learning algorithms in intrusion detection. The results show that a set of 10 features and a small amount of data is enough for the final model to perform very well. MDPI 2021-11-18 /pmc/articles/PMC8619486/ /pubmed/34828230 http://dx.doi.org/10.3390/e23111532 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Komisarek, Mikołaj Pawlicki, Marek Kozik, Rafał Hołubowicz, Witold Choraś, Michał How to Effectively Collect and Process Network Data for Intrusion Detection?
title	How to Effectively Collect and Process Network Data for Intrusion Detection?
title_full	How to Effectively Collect and Process Network Data for Intrusion Detection?
title_fullStr	How to Effectively Collect and Process Network Data for Intrusion Detection?
title_full_unstemmed	How to Effectively Collect and Process Network Data for Intrusion Detection?
title_short	How to Effectively Collect and Process Network Data for Intrusion Detection?
title_sort	how to effectively collect and process network data for intrusion detection?
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8619486/ https://www.ncbi.nlm.nih.gov/pubmed/34828230 http://dx.doi.org/10.3390/e23111532
work_keys_str_mv	AT komisarekmikołaj howtoeffectivelycollectandprocessnetworkdataforintrusiondetection AT pawlickimarek howtoeffectivelycollectandprocessnetworkdataforintrusiondetection AT kozikrafał howtoeffectivelycollectandprocessnetworkdataforintrusiondetection AT hołubowiczwitold howtoeffectivelycollectandprocessnetworkdataforintrusiondetection AT chorasmichał howtoeffectivelycollectandprocessnetworkdataforintrusiondetection

How to Effectively Collect and Process Network Data for Intrusion Detection?

Ejemplares similares