Cargando…

Windows malware detection based on static analysis with multiple features

Malware or malicious software is an intrusive software that infects or performs harmful activities on a computer under attack. Malware has been a threat to individuals and organizations since the dawn of computers and the research community has been struggling to develop efficient methods to detect...

Descripción completa

Detalles Bibliográficos
Autores principales: Yousuf, Muhammad Irfan, Anwer, Izza, Riasat, Ayesha, Zia, Khawaja Tahir, Kim, Suhyun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280383/
https://www.ncbi.nlm.nih.gov/pubmed/37346681
http://dx.doi.org/10.7717/peerj-cs.1319
_version_ 1785060781628325888
author Yousuf, Muhammad Irfan
Anwer, Izza
Riasat, Ayesha
Zia, Khawaja Tahir
Kim, Suhyun
author_facet Yousuf, Muhammad Irfan
Anwer, Izza
Riasat, Ayesha
Zia, Khawaja Tahir
Kim, Suhyun
author_sort Yousuf, Muhammad Irfan
collection PubMed
description Malware or malicious software is an intrusive software that infects or performs harmful activities on a computer under attack. Malware has been a threat to individuals and organizations since the dawn of computers and the research community has been struggling to develop efficient methods to detect malware. In this work, we present a static malware detection system to detect Portable Executable (PE) malware in Windows environment and classify them as benign or malware with high accuracy. First, we collect a total of 27,920 Windows PE malware samples divided into six categories and create a new dataset by extracting four types of information including the list of imported DLLs and API functions called by these samples, values of 52 attributes from PE Header and 100 attributes of PE Section. We also amalgamate this information to create two integrated feature sets. Second, we apply seven machine learning models; gradient boosting, decision tree, random forest, support vector machine, K-nearest neighbor, naive Bayes, and nearest centroid, and three ensemble learning techniques including Majority Voting, Stack Generalization, and AdaBoost to classify the malware. Third, to further improve the performance of our malware detection system, we also deploy two dimensionality reduction techniques: Information Gain and Principal Component Analysis. We perform a number of experiments to test the performance and robustness of our system on both raw and selected features and show its supremacy over previous studies. By combining machine learning, ensemble learning and dimensionality reduction techniques, we construct a static malware detection system which achieves a detection rate of 99.5% and error rate of only 0.47%.
format Online
Article
Text
id pubmed-10280383
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-102803832023-06-21 Windows malware detection based on static analysis with multiple features Yousuf, Muhammad Irfan Anwer, Izza Riasat, Ayesha Zia, Khawaja Tahir Kim, Suhyun PeerJ Comput Sci Data Mining and Machine Learning Malware or malicious software is an intrusive software that infects or performs harmful activities on a computer under attack. Malware has been a threat to individuals and organizations since the dawn of computers and the research community has been struggling to develop efficient methods to detect malware. In this work, we present a static malware detection system to detect Portable Executable (PE) malware in Windows environment and classify them as benign or malware with high accuracy. First, we collect a total of 27,920 Windows PE malware samples divided into six categories and create a new dataset by extracting four types of information including the list of imported DLLs and API functions called by these samples, values of 52 attributes from PE Header and 100 attributes of PE Section. We also amalgamate this information to create two integrated feature sets. Second, we apply seven machine learning models; gradient boosting, decision tree, random forest, support vector machine, K-nearest neighbor, naive Bayes, and nearest centroid, and three ensemble learning techniques including Majority Voting, Stack Generalization, and AdaBoost to classify the malware. Third, to further improve the performance of our malware detection system, we also deploy two dimensionality reduction techniques: Information Gain and Principal Component Analysis. We perform a number of experiments to test the performance and robustness of our system on both raw and selected features and show its supremacy over previous studies. By combining machine learning, ensemble learning and dimensionality reduction techniques, we construct a static malware detection system which achieves a detection rate of 99.5% and error rate of only 0.47%. PeerJ Inc. 2023-04-21 /pmc/articles/PMC10280383/ /pubmed/37346681 http://dx.doi.org/10.7717/peerj-cs.1319 Text en ©2023 Yousuf et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Data Mining and Machine Learning
Yousuf, Muhammad Irfan
Anwer, Izza
Riasat, Ayesha
Zia, Khawaja Tahir
Kim, Suhyun
Windows malware detection based on static analysis with multiple features
title Windows malware detection based on static analysis with multiple features
title_full Windows malware detection based on static analysis with multiple features
title_fullStr Windows malware detection based on static analysis with multiple features
title_full_unstemmed Windows malware detection based on static analysis with multiple features
title_short Windows malware detection based on static analysis with multiple features
title_sort windows malware detection based on static analysis with multiple features
topic Data Mining and Machine Learning
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280383/
https://www.ncbi.nlm.nih.gov/pubmed/37346681
http://dx.doi.org/10.7717/peerj-cs.1319
work_keys_str_mv AT yousufmuhammadirfan windowsmalwaredetectionbasedonstaticanalysiswithmultiplefeatures
AT anwerizza windowsmalwaredetectionbasedonstaticanalysiswithmultiplefeatures
AT riasatayesha windowsmalwaredetectionbasedonstaticanalysiswithmultiplefeatures
AT ziakhawajatahir windowsmalwaredetectionbasedonstaticanalysiswithmultiplefeatures
AT kimsuhyun windowsmalwaredetectionbasedonstaticanalysiswithmultiplefeatures