Cargando…

Zero-Day Malware Detection and Effective Malware Analysis Using Shapley Ensemble Boosting and Bagging Approach

Software products from all vendors have vulnerabilities that can cause a security concern. Malware is used as a prime exploitation tool to exploit these vulnerabilities. Machine learning (ML) methods are efficient in detecting malware and are state-of-art. The effectiveness of ML models can be augme...

Descripción completa

Detalles Bibliográficos
Autores principales: Kumar, Rajesh, Subbiah, Geetha
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9002855/
https://www.ncbi.nlm.nih.gov/pubmed/35408413
http://dx.doi.org/10.3390/s22072798
_version_ 1784685990884933632
author Kumar, Rajesh
Subbiah, Geetha
author_facet Kumar, Rajesh
Subbiah, Geetha
author_sort Kumar, Rajesh
collection PubMed
description Software products from all vendors have vulnerabilities that can cause a security concern. Malware is used as a prime exploitation tool to exploit these vulnerabilities. Machine learning (ML) methods are efficient in detecting malware and are state-of-art. The effectiveness of ML models can be augmented by reducing false negatives and false positives. In this paper, the performance of bagging and boosting machine learning models is enhanced by reducing misclassification. Shapley values of features are a true representation of the amount of contribution of features and help detect top features for any prediction by the ML model. Shapley values are transformed to probability scale to correlate with a prediction value of ML model and to detect top features for any prediction by a trained ML model. The trend of top features derived from false negative and false positive predictions by a trained ML model can be used for making inductive rules. In this work, the best performing ML model in bagging and boosting is determined by the accuracy and confusion matrix on three malware datasets from three different periods. The best performing ML model is used to make effective inductive rules using waterfall plots based on the probability scale of features. This work helps improve cyber security scenarios by effective detection of false-negative zero-day malware.
format Online
Article
Text
id pubmed-9002855
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-90028552022-04-13 Zero-Day Malware Detection and Effective Malware Analysis Using Shapley Ensemble Boosting and Bagging Approach Kumar, Rajesh Subbiah, Geetha Sensors (Basel) Article Software products from all vendors have vulnerabilities that can cause a security concern. Malware is used as a prime exploitation tool to exploit these vulnerabilities. Machine learning (ML) methods are efficient in detecting malware and are state-of-art. The effectiveness of ML models can be augmented by reducing false negatives and false positives. In this paper, the performance of bagging and boosting machine learning models is enhanced by reducing misclassification. Shapley values of features are a true representation of the amount of contribution of features and help detect top features for any prediction by the ML model. Shapley values are transformed to probability scale to correlate with a prediction value of ML model and to detect top features for any prediction by a trained ML model. The trend of top features derived from false negative and false positive predictions by a trained ML model can be used for making inductive rules. In this work, the best performing ML model in bagging and boosting is determined by the accuracy and confusion matrix on three malware datasets from three different periods. The best performing ML model is used to make effective inductive rules using waterfall plots based on the probability scale of features. This work helps improve cyber security scenarios by effective detection of false-negative zero-day malware. MDPI 2022-04-06 /pmc/articles/PMC9002855/ /pubmed/35408413 http://dx.doi.org/10.3390/s22072798 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kumar, Rajesh
Subbiah, Geetha
Zero-Day Malware Detection and Effective Malware Analysis Using Shapley Ensemble Boosting and Bagging Approach
title Zero-Day Malware Detection and Effective Malware Analysis Using Shapley Ensemble Boosting and Bagging Approach
title_full Zero-Day Malware Detection and Effective Malware Analysis Using Shapley Ensemble Boosting and Bagging Approach
title_fullStr Zero-Day Malware Detection and Effective Malware Analysis Using Shapley Ensemble Boosting and Bagging Approach
title_full_unstemmed Zero-Day Malware Detection and Effective Malware Analysis Using Shapley Ensemble Boosting and Bagging Approach
title_short Zero-Day Malware Detection and Effective Malware Analysis Using Shapley Ensemble Boosting and Bagging Approach
title_sort zero-day malware detection and effective malware analysis using shapley ensemble boosting and bagging approach
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9002855/
https://www.ncbi.nlm.nih.gov/pubmed/35408413
http://dx.doi.org/10.3390/s22072798
work_keys_str_mv AT kumarrajesh zerodaymalwaredetectionandeffectivemalwareanalysisusingshapleyensembleboostingandbaggingapproach
AT subbiahgeetha zerodaymalwaredetectionandeffectivemalwareanalysisusingshapleyensembleboostingandbaggingapproach