Cargando…

Efficient Feature Selection for Static Analysis Vulnerability Prediction

Common software vulnerabilities can result in severe security breaches, financial losses, and reputation deterioration and require research effort to improve software security. The acceleration of the software production cycle, limited testing resources, and the lack of security expertise among prog...

Descripción completa

Detalles Bibliográficos
Autores principales: Filus, Katarzyna, Boryszko, Paweł, Domańska, Joanna, Siavvas, Miltiadis, Gelenbe, Erol
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7915846/
https://www.ncbi.nlm.nih.gov/pubmed/33561957
http://dx.doi.org/10.3390/s21041133
_version_ 1783657341959274496
author Filus, Katarzyna
Boryszko, Paweł
Domańska, Joanna
Siavvas, Miltiadis
Gelenbe, Erol
author_facet Filus, Katarzyna
Boryszko, Paweł
Domańska, Joanna
Siavvas, Miltiadis
Gelenbe, Erol
author_sort Filus, Katarzyna
collection PubMed
description Common software vulnerabilities can result in severe security breaches, financial losses, and reputation deterioration and require research effort to improve software security. The acceleration of the software production cycle, limited testing resources, and the lack of security expertise among programmers require the identification of efficient software vulnerability predictors to highlight the system components on which testing should be focused. Although static code analyzers are often used to improve software quality together with machine learning and data mining for software vulnerability prediction, the work regarding the selection and evaluation of different types of relevant vulnerability features is still limited. Thus, in this paper, we examine features generated by SonarQube and CCCC tools, to identify those that can be used for software vulnerability prediction. We investigate the suitability of thirty-three different features to train thirteen distinct machine learning algorithms to design vulnerability predictors and identify the most relevant features that should be used for training. Our evaluation is based on a comprehensive feature selection process based on the correlation analysis of the features, together with four well-known feature selection techniques. Our experiments, using a large publicly available dataset, facilitate the evaluation and result in the identification of small, but efficient sets of features for software vulnerability prediction.
format Online
Article
Text
id pubmed-7915846
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-79158462021-03-01 Efficient Feature Selection for Static Analysis Vulnerability Prediction Filus, Katarzyna Boryszko, Paweł Domańska, Joanna Siavvas, Miltiadis Gelenbe, Erol Sensors (Basel) Article Common software vulnerabilities can result in severe security breaches, financial losses, and reputation deterioration and require research effort to improve software security. The acceleration of the software production cycle, limited testing resources, and the lack of security expertise among programmers require the identification of efficient software vulnerability predictors to highlight the system components on which testing should be focused. Although static code analyzers are often used to improve software quality together with machine learning and data mining for software vulnerability prediction, the work regarding the selection and evaluation of different types of relevant vulnerability features is still limited. Thus, in this paper, we examine features generated by SonarQube and CCCC tools, to identify those that can be used for software vulnerability prediction. We investigate the suitability of thirty-three different features to train thirteen distinct machine learning algorithms to design vulnerability predictors and identify the most relevant features that should be used for training. Our evaluation is based on a comprehensive feature selection process based on the correlation analysis of the features, together with four well-known feature selection techniques. Our experiments, using a large publicly available dataset, facilitate the evaluation and result in the identification of small, but efficient sets of features for software vulnerability prediction. MDPI 2021-02-06 /pmc/articles/PMC7915846/ /pubmed/33561957 http://dx.doi.org/10.3390/s21041133 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Filus, Katarzyna
Boryszko, Paweł
Domańska, Joanna
Siavvas, Miltiadis
Gelenbe, Erol
Efficient Feature Selection for Static Analysis Vulnerability Prediction
title Efficient Feature Selection for Static Analysis Vulnerability Prediction
title_full Efficient Feature Selection for Static Analysis Vulnerability Prediction
title_fullStr Efficient Feature Selection for Static Analysis Vulnerability Prediction
title_full_unstemmed Efficient Feature Selection for Static Analysis Vulnerability Prediction
title_short Efficient Feature Selection for Static Analysis Vulnerability Prediction
title_sort efficient feature selection for static analysis vulnerability prediction
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7915846/
https://www.ncbi.nlm.nih.gov/pubmed/33561957
http://dx.doi.org/10.3390/s21041133
work_keys_str_mv AT filuskatarzyna efficientfeatureselectionforstaticanalysisvulnerabilityprediction
AT boryszkopaweł efficientfeatureselectionforstaticanalysisvulnerabilityprediction
AT domanskajoanna efficientfeatureselectionforstaticanalysisvulnerabilityprediction
AT siavvasmiltiadis efficientfeatureselectionforstaticanalysisvulnerabilityprediction
AT gelenbeerol efficientfeatureselectionforstaticanalysisvulnerabilityprediction