Cargando…

Homology analysis of malware based on ensemble learning and multifeatures

With the exponential increase in malware, homology analysis has become a hot research topic in the malware detection field. This paper proposes MHAS, a malware homology analysis system based on ensemble learning and multifeatures. MHAS generates grayscale images from malware binary files and then us...

Descripción completa

Detalles Bibliográficos
Autores principales: Xue, Di, Li, Jingmei, Wu, Weifei, Tian, Qiao, Wang, JiaXiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6709908/
https://www.ncbi.nlm.nih.gov/pubmed/31449533
http://dx.doi.org/10.1371/journal.pone.0211373
_version_ 1783446261159952384
author Xue, Di
Li, Jingmei
Wu, Weifei
Tian, Qiao
Wang, JiaXiang
author_facet Xue, Di
Li, Jingmei
Wu, Weifei
Tian, Qiao
Wang, JiaXiang
author_sort Xue, Di
collection PubMed
description With the exponential increase in malware, homology analysis has become a hot research topic in the malware detection field. This paper proposes MHAS, a malware homology analysis system based on ensemble learning and multifeatures. MHAS generates grayscale images from malware binary files and then uses the opcode tool IDA Pro to extract opcode sequences and system call graphs. Thus, RGB images and M-images are generated on the image matrix. Then, MHAS uses convolutional neural networks (CNNs) as base learners to perform bagging ensemble learning to learn features from the grayscale images, RGB images and M-images. Next, MHAS integrates the nine base learners using voting, learning and selective ensemble (in that order) and maps the integration results to the result matrix. Finally, the result matrix is again integrated using the learning method to obtain the final malware classification result. To verify the accuracy of MHAS, we performed a malware family classification experiment, that included samples of 10 malware families. The results showed that MHAS can reach an accuracy rate of 99.17%, meaning that it can effectively analyze and identify malware families.
format Online
Article
Text
id pubmed-6709908
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-67099082019-09-10 Homology analysis of malware based on ensemble learning and multifeatures Xue, Di Li, Jingmei Wu, Weifei Tian, Qiao Wang, JiaXiang PLoS One Research Article With the exponential increase in malware, homology analysis has become a hot research topic in the malware detection field. This paper proposes MHAS, a malware homology analysis system based on ensemble learning and multifeatures. MHAS generates grayscale images from malware binary files and then uses the opcode tool IDA Pro to extract opcode sequences and system call graphs. Thus, RGB images and M-images are generated on the image matrix. Then, MHAS uses convolutional neural networks (CNNs) as base learners to perform bagging ensemble learning to learn features from the grayscale images, RGB images and M-images. Next, MHAS integrates the nine base learners using voting, learning and selective ensemble (in that order) and maps the integration results to the result matrix. Finally, the result matrix is again integrated using the learning method to obtain the final malware classification result. To verify the accuracy of MHAS, we performed a malware family classification experiment, that included samples of 10 malware families. The results showed that MHAS can reach an accuracy rate of 99.17%, meaning that it can effectively analyze and identify malware families. Public Library of Science 2019-08-26 /pmc/articles/PMC6709908/ /pubmed/31449533 http://dx.doi.org/10.1371/journal.pone.0211373 Text en © 2019 Xue et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Xue, Di
Li, Jingmei
Wu, Weifei
Tian, Qiao
Wang, JiaXiang
Homology analysis of malware based on ensemble learning and multifeatures
title Homology analysis of malware based on ensemble learning and multifeatures
title_full Homology analysis of malware based on ensemble learning and multifeatures
title_fullStr Homology analysis of malware based on ensemble learning and multifeatures
title_full_unstemmed Homology analysis of malware based on ensemble learning and multifeatures
title_short Homology analysis of malware based on ensemble learning and multifeatures
title_sort homology analysis of malware based on ensemble learning and multifeatures
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6709908/
https://www.ncbi.nlm.nih.gov/pubmed/31449533
http://dx.doi.org/10.1371/journal.pone.0211373
work_keys_str_mv AT xuedi homologyanalysisofmalwarebasedonensemblelearningandmultifeatures
AT lijingmei homologyanalysisofmalwarebasedonensemblelearningandmultifeatures
AT wuweifei homologyanalysisofmalwarebasedonensemblelearningandmultifeatures
AT tianqiao homologyanalysisofmalwarebasedonensemblelearningandmultifeatures
AT wangjiaxiang homologyanalysisofmalwarebasedonensemblelearningandmultifeatures