Cargando…
Efficient Feature Selection for Static Analysis Vulnerability Prediction
Common software vulnerabilities can result in severe security breaches, financial losses, and reputation deterioration and require research effort to improve software security. The acceleration of the software production cycle, limited testing resources, and the lack of security expertise among prog...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7915846/ https://www.ncbi.nlm.nih.gov/pubmed/33561957 http://dx.doi.org/10.3390/s21041133 |
_version_ | 1783657341959274496 |
---|---|
author | Filus, Katarzyna Boryszko, Paweł Domańska, Joanna Siavvas, Miltiadis Gelenbe, Erol |
author_facet | Filus, Katarzyna Boryszko, Paweł Domańska, Joanna Siavvas, Miltiadis Gelenbe, Erol |
author_sort | Filus, Katarzyna |
collection | PubMed |
description | Common software vulnerabilities can result in severe security breaches, financial losses, and reputation deterioration and require research effort to improve software security. The acceleration of the software production cycle, limited testing resources, and the lack of security expertise among programmers require the identification of efficient software vulnerability predictors to highlight the system components on which testing should be focused. Although static code analyzers are often used to improve software quality together with machine learning and data mining for software vulnerability prediction, the work regarding the selection and evaluation of different types of relevant vulnerability features is still limited. Thus, in this paper, we examine features generated by SonarQube and CCCC tools, to identify those that can be used for software vulnerability prediction. We investigate the suitability of thirty-three different features to train thirteen distinct machine learning algorithms to design vulnerability predictors and identify the most relevant features that should be used for training. Our evaluation is based on a comprehensive feature selection process based on the correlation analysis of the features, together with four well-known feature selection techniques. Our experiments, using a large publicly available dataset, facilitate the evaluation and result in the identification of small, but efficient sets of features for software vulnerability prediction. |
format | Online Article Text |
id | pubmed-7915846 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-79158462021-03-01 Efficient Feature Selection for Static Analysis Vulnerability Prediction Filus, Katarzyna Boryszko, Paweł Domańska, Joanna Siavvas, Miltiadis Gelenbe, Erol Sensors (Basel) Article Common software vulnerabilities can result in severe security breaches, financial losses, and reputation deterioration and require research effort to improve software security. The acceleration of the software production cycle, limited testing resources, and the lack of security expertise among programmers require the identification of efficient software vulnerability predictors to highlight the system components on which testing should be focused. Although static code analyzers are often used to improve software quality together with machine learning and data mining for software vulnerability prediction, the work regarding the selection and evaluation of different types of relevant vulnerability features is still limited. Thus, in this paper, we examine features generated by SonarQube and CCCC tools, to identify those that can be used for software vulnerability prediction. We investigate the suitability of thirty-three different features to train thirteen distinct machine learning algorithms to design vulnerability predictors and identify the most relevant features that should be used for training. Our evaluation is based on a comprehensive feature selection process based on the correlation analysis of the features, together with four well-known feature selection techniques. Our experiments, using a large publicly available dataset, facilitate the evaluation and result in the identification of small, but efficient sets of features for software vulnerability prediction. MDPI 2021-02-06 /pmc/articles/PMC7915846/ /pubmed/33561957 http://dx.doi.org/10.3390/s21041133 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Filus, Katarzyna Boryszko, Paweł Domańska, Joanna Siavvas, Miltiadis Gelenbe, Erol Efficient Feature Selection for Static Analysis Vulnerability Prediction |
title | Efficient Feature Selection for Static Analysis Vulnerability Prediction |
title_full | Efficient Feature Selection for Static Analysis Vulnerability Prediction |
title_fullStr | Efficient Feature Selection for Static Analysis Vulnerability Prediction |
title_full_unstemmed | Efficient Feature Selection for Static Analysis Vulnerability Prediction |
title_short | Efficient Feature Selection for Static Analysis Vulnerability Prediction |
title_sort | efficient feature selection for static analysis vulnerability prediction |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7915846/ https://www.ncbi.nlm.nih.gov/pubmed/33561957 http://dx.doi.org/10.3390/s21041133 |
work_keys_str_mv | AT filuskatarzyna efficientfeatureselectionforstaticanalysisvulnerabilityprediction AT boryszkopaweł efficientfeatureselectionforstaticanalysisvulnerabilityprediction AT domanskajoanna efficientfeatureselectionforstaticanalysisvulnerabilityprediction AT siavvasmiltiadis efficientfeatureselectionforstaticanalysisvulnerabilityprediction AT gelenbeerol efficientfeatureselectionforstaticanalysisvulnerabilityprediction |