Cargando…

Double-Layer Detection Model of Malicious PDF Documents Based on Entropy Method with Multiple Features

Traditional PDF document detection technology usually builds a rule or feature library for specific vulnerabilities and therefore is only fit for single detection targets and lacks anti-detection ability. To address these shortcomings, we build a double-layer detection model for malicious PDF docume...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Enzhou, Hu, Tao, Yi, Peng, Wang, Wenbo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10378245/
https://www.ncbi.nlm.nih.gov/pubmed/37510046
http://dx.doi.org/10.3390/e25071099
_version_ 1785079717894815744
author Song, Enzhou
Hu, Tao
Yi, Peng
Wang, Wenbo
author_facet Song, Enzhou
Hu, Tao
Yi, Peng
Wang, Wenbo
author_sort Song, Enzhou
collection PubMed
description Traditional PDF document detection technology usually builds a rule or feature library for specific vulnerabilities and therefore is only fit for single detection targets and lacks anti-detection ability. To address these shortcomings, we build a double-layer detection model for malicious PDF documents based on an entropy method with multiple features. First, we address the single detection target problem with the fusion of 222 multiple features, including 130 basic features (such as objects, structure, content stream, metadata, etc.) and 82 dangerous features (such as suspicious and encoding function, etc.), which can effectively resist obfuscation and encryption. Second, we generate the best set of features (a total of 153) by creatively applying an entropy method based on RReliefF and MIC (EMBORAM) to PDF samples with 37 typical document vulnerabilities, which can effectively resist anti-detection methods, such as filling data and imitation attacks. Finally, we build a double-layer processing framework to detect samples efficiently through the AdaBoost-optimized random forest algorithm and the robustness-optimized support vector machine algorithm. Compared to the traditional static detection method, this model performs better for various evaluation criteria. The average time of document detection is 1.3 ms, while the accuracy rate reaches 95.9%.
format Online
Article
Text
id pubmed-10378245
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-103782452023-07-29 Double-Layer Detection Model of Malicious PDF Documents Based on Entropy Method with Multiple Features Song, Enzhou Hu, Tao Yi, Peng Wang, Wenbo Entropy (Basel) Article Traditional PDF document detection technology usually builds a rule or feature library for specific vulnerabilities and therefore is only fit for single detection targets and lacks anti-detection ability. To address these shortcomings, we build a double-layer detection model for malicious PDF documents based on an entropy method with multiple features. First, we address the single detection target problem with the fusion of 222 multiple features, including 130 basic features (such as objects, structure, content stream, metadata, etc.) and 82 dangerous features (such as suspicious and encoding function, etc.), which can effectively resist obfuscation and encryption. Second, we generate the best set of features (a total of 153) by creatively applying an entropy method based on RReliefF and MIC (EMBORAM) to PDF samples with 37 typical document vulnerabilities, which can effectively resist anti-detection methods, such as filling data and imitation attacks. Finally, we build a double-layer processing framework to detect samples efficiently through the AdaBoost-optimized random forest algorithm and the robustness-optimized support vector machine algorithm. Compared to the traditional static detection method, this model performs better for various evaluation criteria. The average time of document detection is 1.3 ms, while the accuracy rate reaches 95.9%. MDPI 2023-07-23 /pmc/articles/PMC10378245/ /pubmed/37510046 http://dx.doi.org/10.3390/e25071099 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Song, Enzhou
Hu, Tao
Yi, Peng
Wang, Wenbo
Double-Layer Detection Model of Malicious PDF Documents Based on Entropy Method with Multiple Features
title Double-Layer Detection Model of Malicious PDF Documents Based on Entropy Method with Multiple Features
title_full Double-Layer Detection Model of Malicious PDF Documents Based on Entropy Method with Multiple Features
title_fullStr Double-Layer Detection Model of Malicious PDF Documents Based on Entropy Method with Multiple Features
title_full_unstemmed Double-Layer Detection Model of Malicious PDF Documents Based on Entropy Method with Multiple Features
title_short Double-Layer Detection Model of Malicious PDF Documents Based on Entropy Method with Multiple Features
title_sort double-layer detection model of malicious pdf documents based on entropy method with multiple features
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10378245/
https://www.ncbi.nlm.nih.gov/pubmed/37510046
http://dx.doi.org/10.3390/e25071099
work_keys_str_mv AT songenzhou doublelayerdetectionmodelofmaliciouspdfdocumentsbasedonentropymethodwithmultiplefeatures
AT hutao doublelayerdetectionmodelofmaliciouspdfdocumentsbasedonentropymethodwithmultiplefeatures
AT yipeng doublelayerdetectionmodelofmaliciouspdfdocumentsbasedonentropymethodwithmultiplefeatures
AT wangwenbo doublelayerdetectionmodelofmaliciouspdfdocumentsbasedonentropymethodwithmultiplefeatures