Cargando…
An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques
Insider threats are malicious acts that can be carried out by an authorized employee within an organization. Insider threats represent a major cybersecurity challenge for private and public organizations, as an insider attack can cause extensive damage to organization assets much more than external...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8535057/ https://www.ncbi.nlm.nih.gov/pubmed/34681982 http://dx.doi.org/10.3390/e23101258 |
_version_ | 1784587685665439744 |
---|---|
author | Al-Shehari, Taher Alsowail, Rakan A. |
author_facet | Al-Shehari, Taher Alsowail, Rakan A. |
author_sort | Al-Shehari, Taher |
collection | PubMed |
description | Insider threats are malicious acts that can be carried out by an authorized employee within an organization. Insider threats represent a major cybersecurity challenge for private and public organizations, as an insider attack can cause extensive damage to organization assets much more than external attacks. Most existing approaches in the field of insider threat focused on detecting general insider attack scenarios. However, insider attacks can be carried out in different ways, and the most dangerous one is a data leakage attack that can be executed by a malicious insider before his/her leaving an organization. This paper proposes a machine learning-based model for detecting such serious insider threat incidents. The proposed model addresses the possible bias of detection results that can occur due to an inappropriate encoding process by employing the feature scaling and one-hot encoding techniques. Furthermore, the imbalance issue of the utilized dataset is also addressed utilizing the synthetic minority oversampling technique (SMOTE). Well known machine learning algorithms are employed to detect the most accurate classifier that can detect data leakage events executed by malicious insiders during the sensitive period before they leave an organization. We provide a proof of concept for our model by applying it on CMU-CERT Insider Threat Dataset and comparing its performance with the ground truth. The experimental results show that our model detects insider data leakage events with an AUC-ROC value of 0.99, outperforming the existing approaches that are validated on the same dataset. The proposed model provides effective methods to address possible bias and class imbalance issues for the aim of devising an effective insider data leakage detection system. |
format | Online Article Text |
id | pubmed-8535057 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-85350572021-10-23 An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques Al-Shehari, Taher Alsowail, Rakan A. Entropy (Basel) Article Insider threats are malicious acts that can be carried out by an authorized employee within an organization. Insider threats represent a major cybersecurity challenge for private and public organizations, as an insider attack can cause extensive damage to organization assets much more than external attacks. Most existing approaches in the field of insider threat focused on detecting general insider attack scenarios. However, insider attacks can be carried out in different ways, and the most dangerous one is a data leakage attack that can be executed by a malicious insider before his/her leaving an organization. This paper proposes a machine learning-based model for detecting such serious insider threat incidents. The proposed model addresses the possible bias of detection results that can occur due to an inappropriate encoding process by employing the feature scaling and one-hot encoding techniques. Furthermore, the imbalance issue of the utilized dataset is also addressed utilizing the synthetic minority oversampling technique (SMOTE). Well known machine learning algorithms are employed to detect the most accurate classifier that can detect data leakage events executed by malicious insiders during the sensitive period before they leave an organization. We provide a proof of concept for our model by applying it on CMU-CERT Insider Threat Dataset and comparing its performance with the ground truth. The experimental results show that our model detects insider data leakage events with an AUC-ROC value of 0.99, outperforming the existing approaches that are validated on the same dataset. The proposed model provides effective methods to address possible bias and class imbalance issues for the aim of devising an effective insider data leakage detection system. MDPI 2021-09-27 /pmc/articles/PMC8535057/ /pubmed/34681982 http://dx.doi.org/10.3390/e23101258 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Al-Shehari, Taher Alsowail, Rakan A. An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques |
title | An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques |
title_full | An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques |
title_fullStr | An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques |
title_full_unstemmed | An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques |
title_short | An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques |
title_sort | insider data leakage detection using one-hot encoding, synthetic minority oversampling and machine learning techniques |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8535057/ https://www.ncbi.nlm.nih.gov/pubmed/34681982 http://dx.doi.org/10.3390/e23101258 |
work_keys_str_mv | AT alsheharitaher aninsiderdataleakagedetectionusingonehotencodingsyntheticminorityoversamplingandmachinelearningtechniques AT alsowailrakana aninsiderdataleakagedetectionusingonehotencodingsyntheticminorityoversamplingandmachinelearningtechniques AT alsheharitaher insiderdataleakagedetectionusingonehotencodingsyntheticminorityoversamplingandmachinelearningtechniques AT alsowailrakana insiderdataleakagedetectionusingonehotencodingsyntheticminorityoversamplingandmachinelearningtechniques |