Cargando…

An Improved Vulnerability Exploitation Prediction Model with Novel Cost Function and Custom Trained Word Vector Embedding

Successful cyber-attacks are caused by the exploitation of some vulnerabilities in the software and/or hardware that exist in systems deployed in premises or the cloud. Although hundreds of vulnerabilities are discovered every year, only a small fraction of them actually become exploited, thereby th...

Descripción completa

Detalles Bibliográficos
Autores principales: Hoque, Mohammad Shamsul, Jamil, Norziana, Amin, Nowshad, Lam, Kwok-Yan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8235709/
https://www.ncbi.nlm.nih.gov/pubmed/34202977
http://dx.doi.org/10.3390/s21124220
_version_ 1783714381881671680
author Hoque, Mohammad Shamsul
Jamil, Norziana
Amin, Nowshad
Lam, Kwok-Yan
author_facet Hoque, Mohammad Shamsul
Jamil, Norziana
Amin, Nowshad
Lam, Kwok-Yan
author_sort Hoque, Mohammad Shamsul
collection PubMed
description Successful cyber-attacks are caused by the exploitation of some vulnerabilities in the software and/or hardware that exist in systems deployed in premises or the cloud. Although hundreds of vulnerabilities are discovered every year, only a small fraction of them actually become exploited, thereby there exists a severe class imbalance between the number of exploited and non-exploited vulnerabilities. The open source national vulnerability database, the largest repository to index and maintain all known vulnerabilities, assigns a unique identifier to each vulnerability. Each registered vulnerability also gets a severity score based on the impact it might inflict upon if compromised. Recent research works showed that the cvss score is not the only factor to select a vulnerability for exploitation, and other attributes in the national vulnerability database can be effectively utilized as predictive feature to predict the most exploitable vulnerabilities. Since cybersecurity management is highly resource savvy, organizations such as cloud systems will benefit when the most likely exploitable vulnerabilities that exist in their system software or hardware can be predicted with as much accuracy and reliability as possible, to best utilize the available resources to fix those first. Various existing research works have developed vulnerability exploitation prediction models by addressing the existing class imbalance based on algorithmic and artificial data resampling techniques but still suffer greatly from the overfitting problem to the major class rendering them practically unreliable. In this research, we have designed a novel cost function feature to address the existing class imbalance. We also have utilized the available large text corpus in the extracted dataset to develop a custom-trained word vector that can better capture the context of the local text data for utilization as an embedded layer in neural networks. Our developed vulnerability exploitation prediction models powered by a novel cost function and custom-trained word vector have achieved very high overall performance metrics for accuracy, precision, recall, F1-Score and AUC score with values of 0.92, 0.89, 0.98, 0.94 and 0.97, respectively, thereby outperforming any existing models while successfully overcoming the existing overfitting problem for class imbalance.
format Online
Article
Text
id pubmed-8235709
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-82357092021-06-27 An Improved Vulnerability Exploitation Prediction Model with Novel Cost Function and Custom Trained Word Vector Embedding Hoque, Mohammad Shamsul Jamil, Norziana Amin, Nowshad Lam, Kwok-Yan Sensors (Basel) Article Successful cyber-attacks are caused by the exploitation of some vulnerabilities in the software and/or hardware that exist in systems deployed in premises or the cloud. Although hundreds of vulnerabilities are discovered every year, only a small fraction of them actually become exploited, thereby there exists a severe class imbalance between the number of exploited and non-exploited vulnerabilities. The open source national vulnerability database, the largest repository to index and maintain all known vulnerabilities, assigns a unique identifier to each vulnerability. Each registered vulnerability also gets a severity score based on the impact it might inflict upon if compromised. Recent research works showed that the cvss score is not the only factor to select a vulnerability for exploitation, and other attributes in the national vulnerability database can be effectively utilized as predictive feature to predict the most exploitable vulnerabilities. Since cybersecurity management is highly resource savvy, organizations such as cloud systems will benefit when the most likely exploitable vulnerabilities that exist in their system software or hardware can be predicted with as much accuracy and reliability as possible, to best utilize the available resources to fix those first. Various existing research works have developed vulnerability exploitation prediction models by addressing the existing class imbalance based on algorithmic and artificial data resampling techniques but still suffer greatly from the overfitting problem to the major class rendering them practically unreliable. In this research, we have designed a novel cost function feature to address the existing class imbalance. We also have utilized the available large text corpus in the extracted dataset to develop a custom-trained word vector that can better capture the context of the local text data for utilization as an embedded layer in neural networks. Our developed vulnerability exploitation prediction models powered by a novel cost function and custom-trained word vector have achieved very high overall performance metrics for accuracy, precision, recall, F1-Score and AUC score with values of 0.92, 0.89, 0.98, 0.94 and 0.97, respectively, thereby outperforming any existing models while successfully overcoming the existing overfitting problem for class imbalance. MDPI 2021-06-20 /pmc/articles/PMC8235709/ /pubmed/34202977 http://dx.doi.org/10.3390/s21124220 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Hoque, Mohammad Shamsul
Jamil, Norziana
Amin, Nowshad
Lam, Kwok-Yan
An Improved Vulnerability Exploitation Prediction Model with Novel Cost Function and Custom Trained Word Vector Embedding
title An Improved Vulnerability Exploitation Prediction Model with Novel Cost Function and Custom Trained Word Vector Embedding
title_full An Improved Vulnerability Exploitation Prediction Model with Novel Cost Function and Custom Trained Word Vector Embedding
title_fullStr An Improved Vulnerability Exploitation Prediction Model with Novel Cost Function and Custom Trained Word Vector Embedding
title_full_unstemmed An Improved Vulnerability Exploitation Prediction Model with Novel Cost Function and Custom Trained Word Vector Embedding
title_short An Improved Vulnerability Exploitation Prediction Model with Novel Cost Function and Custom Trained Word Vector Embedding
title_sort improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8235709/
https://www.ncbi.nlm.nih.gov/pubmed/34202977
http://dx.doi.org/10.3390/s21124220
work_keys_str_mv AT hoquemohammadshamsul animprovedvulnerabilityexploitationpredictionmodelwithnovelcostfunctionandcustomtrainedwordvectorembedding
AT jamilnorziana animprovedvulnerabilityexploitationpredictionmodelwithnovelcostfunctionandcustomtrainedwordvectorembedding
AT aminnowshad animprovedvulnerabilityexploitationpredictionmodelwithnovelcostfunctionandcustomtrainedwordvectorembedding
AT lamkwokyan animprovedvulnerabilityexploitationpredictionmodelwithnovelcostfunctionandcustomtrainedwordvectorembedding
AT hoquemohammadshamsul improvedvulnerabilityexploitationpredictionmodelwithnovelcostfunctionandcustomtrainedwordvectorembedding
AT jamilnorziana improvedvulnerabilityexploitationpredictionmodelwithnovelcostfunctionandcustomtrainedwordvectorembedding
AT aminnowshad improvedvulnerabilityexploitationpredictionmodelwithnovelcostfunctionandcustomtrainedwordvectorembedding
AT lamkwokyan improvedvulnerabilityexploitationpredictionmodelwithnovelcostfunctionandcustomtrainedwordvectorembedding