Cargando…

SELID: Selective Event Labeling for Intrusion Detection Datasets

A large volume of security events, generally collected by distributed monitoring sensors, overwhelms human analysts at security operations centers and raises an alert fatigue problem. Machine learning is expected to mitigate this problem by automatically distinguishing between true alerts, or attack...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jang, Woohyuk, Kim, Hyunmin, Seo, Hyungbin, Kim, Minsong, Yoon, Myungkeun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10347169/ https://www.ncbi.nlm.nih.gov/pubmed/37447954 http://dx.doi.org/10.3390/s23136105

_version_	1785073486248542208
author	Jang, Woohyuk Kim, Hyunmin Seo, Hyungbin Kim, Minsong Yoon, Myungkeun
author_facet	Jang, Woohyuk Kim, Hyunmin Seo, Hyungbin Kim, Minsong Yoon, Myungkeun
author_sort	Jang, Woohyuk
collection	PubMed
description	A large volume of security events, generally collected by distributed monitoring sensors, overwhelms human analysts at security operations centers and raises an alert fatigue problem. Machine learning is expected to mitigate this problem by automatically distinguishing between true alerts, or attacks, and falsely reported ones. Machine learning models should first be trained on datasets having correct labels, but the labeling process itself requires considerable human resources. In this paper, we present a new selective sampling scheme for efficient data labeling via unsupervised clustering. The new scheme transforms the byte sequence of an event into a fixed-size vector through content-defined chunking and feature hashing. Then, a clustering algorithm is applied to the vectors, and only a few samples from each cluster are selected for manual labeling. The experimental results demonstrate that the new scheme can select only 2% of the data for labeling without degrading the F1-score of the machine learning model. Two datasets, a private dataset from a real security operations center and a public dataset from the Internet for experimental reproducibility, are used.
format	Online Article Text
id	pubmed-10347169
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-103471692023-07-15 SELID: Selective Event Labeling for Intrusion Detection Datasets Jang, Woohyuk Kim, Hyunmin Seo, Hyungbin Kim, Minsong Yoon, Myungkeun Sensors (Basel) Article A large volume of security events, generally collected by distributed monitoring sensors, overwhelms human analysts at security operations centers and raises an alert fatigue problem. Machine learning is expected to mitigate this problem by automatically distinguishing between true alerts, or attacks, and falsely reported ones. Machine learning models should first be trained on datasets having correct labels, but the labeling process itself requires considerable human resources. In this paper, we present a new selective sampling scheme for efficient data labeling via unsupervised clustering. The new scheme transforms the byte sequence of an event into a fixed-size vector through content-defined chunking and feature hashing. Then, a clustering algorithm is applied to the vectors, and only a few samples from each cluster are selected for manual labeling. The experimental results demonstrate that the new scheme can select only 2% of the data for labeling without degrading the F1-score of the machine learning model. Two datasets, a private dataset from a real security operations center and a public dataset from the Internet for experimental reproducibility, are used. MDPI 2023-07-02 /pmc/articles/PMC10347169/ /pubmed/37447954 http://dx.doi.org/10.3390/s23136105 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Jang, Woohyuk Kim, Hyunmin Seo, Hyungbin Kim, Minsong Yoon, Myungkeun SELID: Selective Event Labeling for Intrusion Detection Datasets
title	SELID: Selective Event Labeling for Intrusion Detection Datasets
title_full	SELID: Selective Event Labeling for Intrusion Detection Datasets
title_fullStr	SELID: Selective Event Labeling for Intrusion Detection Datasets
title_full_unstemmed	SELID: Selective Event Labeling for Intrusion Detection Datasets
title_short	SELID: Selective Event Labeling for Intrusion Detection Datasets
title_sort	selid: selective event labeling for intrusion detection datasets
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10347169/ https://www.ncbi.nlm.nih.gov/pubmed/37447954 http://dx.doi.org/10.3390/s23136105
work_keys_str_mv	AT jangwoohyuk selidselectiveeventlabelingforintrusiondetectiondatasets AT kimhyunmin selidselectiveeventlabelingforintrusiondetectiondatasets AT seohyungbin selidselectiveeventlabelingforintrusiondetectiondatasets AT kimminsong selidselectiveeventlabelingforintrusiondetectiondatasets AT yoonmyungkeun selidselectiveeventlabelingforintrusiondetectiondatasets

SELID: Selective Event Labeling for Intrusion Detection Datasets

Ejemplares similares