Cargando…
Homology analysis of malware based on ensemble learning and multifeatures
With the exponential increase in malware, homology analysis has become a hot research topic in the malware detection field. This paper proposes MHAS, a malware homology analysis system based on ensemble learning and multifeatures. MHAS generates grayscale images from malware binary files and then us...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6709908/ https://www.ncbi.nlm.nih.gov/pubmed/31449533 http://dx.doi.org/10.1371/journal.pone.0211373 |
_version_ | 1783446261159952384 |
---|---|
author | Xue, Di Li, Jingmei Wu, Weifei Tian, Qiao Wang, JiaXiang |
author_facet | Xue, Di Li, Jingmei Wu, Weifei Tian, Qiao Wang, JiaXiang |
author_sort | Xue, Di |
collection | PubMed |
description | With the exponential increase in malware, homology analysis has become a hot research topic in the malware detection field. This paper proposes MHAS, a malware homology analysis system based on ensemble learning and multifeatures. MHAS generates grayscale images from malware binary files and then uses the opcode tool IDA Pro to extract opcode sequences and system call graphs. Thus, RGB images and M-images are generated on the image matrix. Then, MHAS uses convolutional neural networks (CNNs) as base learners to perform bagging ensemble learning to learn features from the grayscale images, RGB images and M-images. Next, MHAS integrates the nine base learners using voting, learning and selective ensemble (in that order) and maps the integration results to the result matrix. Finally, the result matrix is again integrated using the learning method to obtain the final malware classification result. To verify the accuracy of MHAS, we performed a malware family classification experiment, that included samples of 10 malware families. The results showed that MHAS can reach an accuracy rate of 99.17%, meaning that it can effectively analyze and identify malware families. |
format | Online Article Text |
id | pubmed-6709908 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-67099082019-09-10 Homology analysis of malware based on ensemble learning and multifeatures Xue, Di Li, Jingmei Wu, Weifei Tian, Qiao Wang, JiaXiang PLoS One Research Article With the exponential increase in malware, homology analysis has become a hot research topic in the malware detection field. This paper proposes MHAS, a malware homology analysis system based on ensemble learning and multifeatures. MHAS generates grayscale images from malware binary files and then uses the opcode tool IDA Pro to extract opcode sequences and system call graphs. Thus, RGB images and M-images are generated on the image matrix. Then, MHAS uses convolutional neural networks (CNNs) as base learners to perform bagging ensemble learning to learn features from the grayscale images, RGB images and M-images. Next, MHAS integrates the nine base learners using voting, learning and selective ensemble (in that order) and maps the integration results to the result matrix. Finally, the result matrix is again integrated using the learning method to obtain the final malware classification result. To verify the accuracy of MHAS, we performed a malware family classification experiment, that included samples of 10 malware families. The results showed that MHAS can reach an accuracy rate of 99.17%, meaning that it can effectively analyze and identify malware families. Public Library of Science 2019-08-26 /pmc/articles/PMC6709908/ /pubmed/31449533 http://dx.doi.org/10.1371/journal.pone.0211373 Text en © 2019 Xue et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Xue, Di Li, Jingmei Wu, Weifei Tian, Qiao Wang, JiaXiang Homology analysis of malware based on ensemble learning and multifeatures |
title | Homology analysis of malware based on ensemble learning and multifeatures |
title_full | Homology analysis of malware based on ensemble learning and multifeatures |
title_fullStr | Homology analysis of malware based on ensemble learning and multifeatures |
title_full_unstemmed | Homology analysis of malware based on ensemble learning and multifeatures |
title_short | Homology analysis of malware based on ensemble learning and multifeatures |
title_sort | homology analysis of malware based on ensemble learning and multifeatures |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6709908/ https://www.ncbi.nlm.nih.gov/pubmed/31449533 http://dx.doi.org/10.1371/journal.pone.0211373 |
work_keys_str_mv | AT xuedi homologyanalysisofmalwarebasedonensemblelearningandmultifeatures AT lijingmei homologyanalysisofmalwarebasedonensemblelearningandmultifeatures AT wuweifei homologyanalysisofmalwarebasedonensemblelearningandmultifeatures AT tianqiao homologyanalysisofmalwarebasedonensemblelearningandmultifeatures AT wangjiaxiang homologyanalysisofmalwarebasedonensemblelearningandmultifeatures |