Cargando…

A Kullback-Liebler divergence-based representation algorithm for malware detection

BACKGROUND: Malware, malicious software, is the major security concern of the digital realm. Conventional cyber-security solutions are challenged by sophisticated malicious behaviors. Currently, an overlap between malicious and legitimate behaviors causes more difficulties in characterizing those be...

Descripción completa

Detalles Bibliográficos
Autores principales:	Aboaoja, Faitouri A., Zainal, Anazida, Ghaleb, Fuad A., Alghamdi, Norah Saleh, Saeed, Faisal, Alhuwayji, Husayn
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2023
Materias:	Artificial Intelligence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10557483/ https://www.ncbi.nlm.nih.gov/pubmed/37810364 http://dx.doi.org/10.7717/peerj-cs.1492

_version_	1785117099350294528
author	Aboaoja, Faitouri A. Zainal, Anazida Ghaleb, Fuad A. Alghamdi, Norah Saleh Saeed, Faisal Alhuwayji, Husayn
author_facet	Aboaoja, Faitouri A. Zainal, Anazida Ghaleb, Fuad A. Alghamdi, Norah Saleh Saeed, Faisal Alhuwayji, Husayn
author_sort	Aboaoja, Faitouri A.
collection	PubMed
description	BACKGROUND: Malware, malicious software, is the major security concern of the digital realm. Conventional cyber-security solutions are challenged by sophisticated malicious behaviors. Currently, an overlap between malicious and legitimate behaviors causes more difficulties in characterizing those behaviors as malicious or legitimate activities. For instance, evasive malware often mimics legitimate behaviors, and evasion techniques are utilized by legitimate and malicious software. PROBLEM: Most of the existing solutions use the traditional term of frequency-inverse document frequency (TF-IDF) technique or its concept to represent malware behaviors. However, the traditional TF-IDF and the developed techniques represent the features, especially the shared ones, inaccurately because those techniques calculate a weight for each feature without considering its distribution in each class; instead, the generated weight is generated based on the distribution of the feature among all the documents. Such presumption can reduce the meaning of those features, and when those features are used to classify malware, they lead to a high false alarms. METHOD: This study proposes a Kullback-Liebler Divergence-based Term Frequency-Probability Class Distribution (KLD-based TF-PCD) algorithm to represent the extracted features based on the differences between the probability distributions of the terms in malware and benign classes. Unlike the existing solution, the proposed algorithm increases the weights of the important features by using the Kullback-Liebler Divergence tool to measure the differences between their probability distributions in malware and benign classes. RESULTS: The experimental results show that the proposed KLD-based TF-PCD algorithm achieved an accuracy of 0.972, the false positive rate of 0.037, and the F-measure of 0.978. Such results were significant compared to the related work studies. Thus, the proposed KLD-based TF-PCD algorithm contributes to improving the security of cyberspace. CONCLUSION: New meaningful characteristics have been added by the proposed algorithm to promote the learned knowledge of the classifiers, and thus increase their ability to classify malicious behaviors accurately.
format	Online Article Text
id	pubmed-10557483
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-105574832023-10-07 A Kullback-Liebler divergence-based representation algorithm for malware detection Aboaoja, Faitouri A. Zainal, Anazida Ghaleb, Fuad A. Alghamdi, Norah Saleh Saeed, Faisal Alhuwayji, Husayn PeerJ Comput Sci Artificial Intelligence BACKGROUND: Malware, malicious software, is the major security concern of the digital realm. Conventional cyber-security solutions are challenged by sophisticated malicious behaviors. Currently, an overlap between malicious and legitimate behaviors causes more difficulties in characterizing those behaviors as malicious or legitimate activities. For instance, evasive malware often mimics legitimate behaviors, and evasion techniques are utilized by legitimate and malicious software. PROBLEM: Most of the existing solutions use the traditional term of frequency-inverse document frequency (TF-IDF) technique or its concept to represent malware behaviors. However, the traditional TF-IDF and the developed techniques represent the features, especially the shared ones, inaccurately because those techniques calculate a weight for each feature without considering its distribution in each class; instead, the generated weight is generated based on the distribution of the feature among all the documents. Such presumption can reduce the meaning of those features, and when those features are used to classify malware, they lead to a high false alarms. METHOD: This study proposes a Kullback-Liebler Divergence-based Term Frequency-Probability Class Distribution (KLD-based TF-PCD) algorithm to represent the extracted features based on the differences between the probability distributions of the terms in malware and benign classes. Unlike the existing solution, the proposed algorithm increases the weights of the important features by using the Kullback-Liebler Divergence tool to measure the differences between their probability distributions in malware and benign classes. RESULTS: The experimental results show that the proposed KLD-based TF-PCD algorithm achieved an accuracy of 0.972, the false positive rate of 0.037, and the F-measure of 0.978. Such results were significant compared to the related work studies. Thus, the proposed KLD-based TF-PCD algorithm contributes to improving the security of cyberspace. CONCLUSION: New meaningful characteristics have been added by the proposed algorithm to promote the learned knowledge of the classifiers, and thus increase their ability to classify malicious behaviors accurately. PeerJ Inc. 2023-09-22 /pmc/articles/PMC10557483/ /pubmed/37810364 http://dx.doi.org/10.7717/peerj-cs.1492 Text en ©2023 Aboaoja et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Artificial Intelligence Aboaoja, Faitouri A. Zainal, Anazida Ghaleb, Fuad A. Alghamdi, Norah Saleh Saeed, Faisal Alhuwayji, Husayn A Kullback-Liebler divergence-based representation algorithm for malware detection
title	A Kullback-Liebler divergence-based representation algorithm for malware detection
title_full	A Kullback-Liebler divergence-based representation algorithm for malware detection
title_fullStr	A Kullback-Liebler divergence-based representation algorithm for malware detection
title_full_unstemmed	A Kullback-Liebler divergence-based representation algorithm for malware detection
title_short	A Kullback-Liebler divergence-based representation algorithm for malware detection
title_sort	kullback-liebler divergence-based representation algorithm for malware detection
topic	Artificial Intelligence
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10557483/ https://www.ncbi.nlm.nih.gov/pubmed/37810364 http://dx.doi.org/10.7717/peerj-cs.1492
work_keys_str_mv	AT aboaojafaitouria akullbacklieblerdivergencebasedrepresentationalgorithmformalwaredetection AT zainalanazida akullbacklieblerdivergencebasedrepresentationalgorithmformalwaredetection AT ghalebfuada akullbacklieblerdivergencebasedrepresentationalgorithmformalwaredetection AT alghamdinorahsaleh akullbacklieblerdivergencebasedrepresentationalgorithmformalwaredetection AT saeedfaisal akullbacklieblerdivergencebasedrepresentationalgorithmformalwaredetection AT alhuwayjihusayn akullbacklieblerdivergencebasedrepresentationalgorithmformalwaredetection AT aboaojafaitouria kullbacklieblerdivergencebasedrepresentationalgorithmformalwaredetection AT zainalanazida kullbacklieblerdivergencebasedrepresentationalgorithmformalwaredetection AT ghalebfuada kullbacklieblerdivergencebasedrepresentationalgorithmformalwaredetection AT alghamdinorahsaleh kullbacklieblerdivergencebasedrepresentationalgorithmformalwaredetection AT saeedfaisal kullbacklieblerdivergencebasedrepresentationalgorithmformalwaredetection AT alhuwayjihusayn kullbacklieblerdivergencebasedrepresentationalgorithmformalwaredetection

A Kullback-Liebler divergence-based representation algorithm for malware detection

Ejemplares similares