Cargando…

An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques

Insider threats are malicious acts that can be carried out by an authorized employee within an organization. Insider threats represent a major cybersecurity challenge for private and public organizations, as an insider attack can cause extensive damage to organization assets much more than external...

Descripción completa

Detalles Bibliográficos
Autores principales:	Al-Shehari, Taher, Alsowail, Rakan A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8535057/ https://www.ncbi.nlm.nih.gov/pubmed/34681982 http://dx.doi.org/10.3390/e23101258

_version_	1784587685665439744
author	Al-Shehari, Taher Alsowail, Rakan A.
author_facet	Al-Shehari, Taher Alsowail, Rakan A.
author_sort	Al-Shehari, Taher
collection	PubMed
description	Insider threats are malicious acts that can be carried out by an authorized employee within an organization. Insider threats represent a major cybersecurity challenge for private and public organizations, as an insider attack can cause extensive damage to organization assets much more than external attacks. Most existing approaches in the field of insider threat focused on detecting general insider attack scenarios. However, insider attacks can be carried out in different ways, and the most dangerous one is a data leakage attack that can be executed by a malicious insider before his/her leaving an organization. This paper proposes a machine learning-based model for detecting such serious insider threat incidents. The proposed model addresses the possible bias of detection results that can occur due to an inappropriate encoding process by employing the feature scaling and one-hot encoding techniques. Furthermore, the imbalance issue of the utilized dataset is also addressed utilizing the synthetic minority oversampling technique (SMOTE). Well known machine learning algorithms are employed to detect the most accurate classifier that can detect data leakage events executed by malicious insiders during the sensitive period before they leave an organization. We provide a proof of concept for our model by applying it on CMU-CERT Insider Threat Dataset and comparing its performance with the ground truth. The experimental results show that our model detects insider data leakage events with an AUC-ROC value of 0.99, outperforming the existing approaches that are validated on the same dataset. The proposed model provides effective methods to address possible bias and class imbalance issues for the aim of devising an effective insider data leakage detection system.
format	Online Article Text
id	pubmed-8535057
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-85350572021-10-23 An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques Al-Shehari, Taher Alsowail, Rakan A. Entropy (Basel) Article Insider threats are malicious acts that can be carried out by an authorized employee within an organization. Insider threats represent a major cybersecurity challenge for private and public organizations, as an insider attack can cause extensive damage to organization assets much more than external attacks. Most existing approaches in the field of insider threat focused on detecting general insider attack scenarios. However, insider attacks can be carried out in different ways, and the most dangerous one is a data leakage attack that can be executed by a malicious insider before his/her leaving an organization. This paper proposes a machine learning-based model for detecting such serious insider threat incidents. The proposed model addresses the possible bias of detection results that can occur due to an inappropriate encoding process by employing the feature scaling and one-hot encoding techniques. Furthermore, the imbalance issue of the utilized dataset is also addressed utilizing the synthetic minority oversampling technique (SMOTE). Well known machine learning algorithms are employed to detect the most accurate classifier that can detect data leakage events executed by malicious insiders during the sensitive period before they leave an organization. We provide a proof of concept for our model by applying it on CMU-CERT Insider Threat Dataset and comparing its performance with the ground truth. The experimental results show that our model detects insider data leakage events with an AUC-ROC value of 0.99, outperforming the existing approaches that are validated on the same dataset. The proposed model provides effective methods to address possible bias and class imbalance issues for the aim of devising an effective insider data leakage detection system. MDPI 2021-09-27 /pmc/articles/PMC8535057/ /pubmed/34681982 http://dx.doi.org/10.3390/e23101258 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Al-Shehari, Taher Alsowail, Rakan A. An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques
title	An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques
title_full	An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques
title_fullStr	An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques
title_full_unstemmed	An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques
title_short	An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques
title_sort	insider data leakage detection using one-hot encoding, synthetic minority oversampling and machine learning techniques
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8535057/ https://www.ncbi.nlm.nih.gov/pubmed/34681982 http://dx.doi.org/10.3390/e23101258
work_keys_str_mv	AT alsheharitaher aninsiderdataleakagedetectionusingonehotencodingsyntheticminorityoversamplingandmachinelearningtechniques AT alsowailrakana aninsiderdataleakagedetectionusingonehotencodingsyntheticminorityoversamplingandmachinelearningtechniques AT alsheharitaher insiderdataleakagedetectionusingonehotencodingsyntheticminorityoversamplingandmachinelearningtechniques AT alsowailrakana insiderdataleakagedetectionusingonehotencodingsyntheticminorityoversamplingandmachinelearningtechniques

An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques

Ejemplares similares