Cargando…

FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm

In recent years, the number of vulnerabilities discovered and publicly disclosed has shown a sharp upward trend. However, the value of exploitation of vulnerabilities varies for attackers, considering that only a small fraction of vulnerabilities are exploited. Therefore, the realization of quick ex...

Descripción completa

Detalles Bibliográficos
Autores principales: Fang, Yong, Liu, Yongcheng, Huang, Cheng, Liu, Liang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7004314/
https://www.ncbi.nlm.nih.gov/pubmed/32027693
http://dx.doi.org/10.1371/journal.pone.0228439
_version_ 1783494696276852736
author Fang, Yong
Liu, Yongcheng
Huang, Cheng
Liu, Liang
author_facet Fang, Yong
Liu, Yongcheng
Huang, Cheng
Liu, Liang
author_sort Fang, Yong
collection PubMed
description In recent years, the number of vulnerabilities discovered and publicly disclosed has shown a sharp upward trend. However, the value of exploitation of vulnerabilities varies for attackers, considering that only a small fraction of vulnerabilities are exploited. Therefore, the realization of quick exclusion of the non-exploitable vulnerabilities and optimal patch prioritization on limited resources has become imperative for organizations. Recent works using machine learning techniques predict exploited vulnerabilities by extracting features from open-source intelligence (OSINT). However, in the face of explosive growth of vulnerability information, there is room for improvement in the application of past methods to multiple threat intelligence. A more general method is needed to deal with various threat intelligence sources. Moreover, in previous methods, traditional text processing methods were used to deal with vulnerability related descriptions, which only grasped the static statistical characteristics but ignored the context and the meaning of the words of the text. To address these challenges, we propose an exploit prediction model, which is based on a combination of fastText and LightGBM algorithm and called fastEmbed. We replicate key portions of the state-of-the-art work of exploit prediction and use them as benchmark models. Our model outperforms the baseline model whether in terms of the generalization ability or the prediction ability without temporal intermixing with an average overall improvement of 6.283% by learning the embedding of vulnerability-related text on extremely imbalanced data sets. Besides, in terms of predicting the exploits in the wild, our model also outperforms the baseline model with an F1 measure of 0.586 on the minority class (33.577% improvement over the work using features from darkweb/deepweb). The results demonstrate that the model can improve the ability to describe the exploitability of vulnerabilities and predict exploits in the wild effectively.
format Online
Article
Text
id pubmed-7004314
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-70043142020-02-18 FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm Fang, Yong Liu, Yongcheng Huang, Cheng Liu, Liang PLoS One Research Article In recent years, the number of vulnerabilities discovered and publicly disclosed has shown a sharp upward trend. However, the value of exploitation of vulnerabilities varies for attackers, considering that only a small fraction of vulnerabilities are exploited. Therefore, the realization of quick exclusion of the non-exploitable vulnerabilities and optimal patch prioritization on limited resources has become imperative for organizations. Recent works using machine learning techniques predict exploited vulnerabilities by extracting features from open-source intelligence (OSINT). However, in the face of explosive growth of vulnerability information, there is room for improvement in the application of past methods to multiple threat intelligence. A more general method is needed to deal with various threat intelligence sources. Moreover, in previous methods, traditional text processing methods were used to deal with vulnerability related descriptions, which only grasped the static statistical characteristics but ignored the context and the meaning of the words of the text. To address these challenges, we propose an exploit prediction model, which is based on a combination of fastText and LightGBM algorithm and called fastEmbed. We replicate key portions of the state-of-the-art work of exploit prediction and use them as benchmark models. Our model outperforms the baseline model whether in terms of the generalization ability or the prediction ability without temporal intermixing with an average overall improvement of 6.283% by learning the embedding of vulnerability-related text on extremely imbalanced data sets. Besides, in terms of predicting the exploits in the wild, our model also outperforms the baseline model with an F1 measure of 0.586 on the minority class (33.577% improvement over the work using features from darkweb/deepweb). The results demonstrate that the model can improve the ability to describe the exploitability of vulnerabilities and predict exploits in the wild effectively. Public Library of Science 2020-02-06 /pmc/articles/PMC7004314/ /pubmed/32027693 http://dx.doi.org/10.1371/journal.pone.0228439 Text en © 2020 Fang et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Fang, Yong
Liu, Yongcheng
Huang, Cheng
Liu, Liang
FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm
title FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm
title_full FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm
title_fullStr FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm
title_full_unstemmed FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm
title_short FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm
title_sort fastembed: predicting vulnerability exploitation possibility based on ensemble machine learning algorithm
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7004314/
https://www.ncbi.nlm.nih.gov/pubmed/32027693
http://dx.doi.org/10.1371/journal.pone.0228439
work_keys_str_mv AT fangyong fastembedpredictingvulnerabilityexploitationpossibilitybasedonensemblemachinelearningalgorithm
AT liuyongcheng fastembedpredictingvulnerabilityexploitationpossibilitybasedonensemblemachinelearningalgorithm
AT huangcheng fastembedpredictingvulnerabilityexploitationpossibilitybasedonensemblemachinelearningalgorithm
AT liuliang fastembedpredictingvulnerabilityexploitationpossibilitybasedonensemblemachinelearningalgorithm