Cargando…

Double-Layer Detection Model of Malicious PDF Documents Based on Entropy Method with Multiple Features

Traditional PDF document detection technology usually builds a rule or feature library for specific vulnerabilities and therefore is only fit for single detection targets and lacks anti-detection ability. To address these shortcomings, we build a double-layer detection model for malicious PDF docume...

Descripción completa

Detalles Bibliográficos
Autores principales:	Song, Enzhou, Hu, Tao, Yi, Peng, Wang, Wenbo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10378245/ https://www.ncbi.nlm.nih.gov/pubmed/37510046 http://dx.doi.org/10.3390/e25071099

_version_	1785079717894815744
author	Song, Enzhou Hu, Tao Yi, Peng Wang, Wenbo
author_facet	Song, Enzhou Hu, Tao Yi, Peng Wang, Wenbo
author_sort	Song, Enzhou
collection	PubMed
description	Traditional PDF document detection technology usually builds a rule or feature library for specific vulnerabilities and therefore is only fit for single detection targets and lacks anti-detection ability. To address these shortcomings, we build a double-layer detection model for malicious PDF documents based on an entropy method with multiple features. First, we address the single detection target problem with the fusion of 222 multiple features, including 130 basic features (such as objects, structure, content stream, metadata, etc.) and 82 dangerous features (such as suspicious and encoding function, etc.), which can effectively resist obfuscation and encryption. Second, we generate the best set of features (a total of 153) by creatively applying an entropy method based on RReliefF and MIC (EMBORAM) to PDF samples with 37 typical document vulnerabilities, which can effectively resist anti-detection methods, such as filling data and imitation attacks. Finally, we build a double-layer processing framework to detect samples efficiently through the AdaBoost-optimized random forest algorithm and the robustness-optimized support vector machine algorithm. Compared to the traditional static detection method, this model performs better for various evaluation criteria. The average time of document detection is 1.3 ms, while the accuracy rate reaches 95.9%.
format	Online Article Text
id	pubmed-10378245
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-103782452023-07-29 Double-Layer Detection Model of Malicious PDF Documents Based on Entropy Method with Multiple Features Song, Enzhou Hu, Tao Yi, Peng Wang, Wenbo Entropy (Basel) Article Traditional PDF document detection technology usually builds a rule or feature library for specific vulnerabilities and therefore is only fit for single detection targets and lacks anti-detection ability. To address these shortcomings, we build a double-layer detection model for malicious PDF documents based on an entropy method with multiple features. First, we address the single detection target problem with the fusion of 222 multiple features, including 130 basic features (such as objects, structure, content stream, metadata, etc.) and 82 dangerous features (such as suspicious and encoding function, etc.), which can effectively resist obfuscation and encryption. Second, we generate the best set of features (a total of 153) by creatively applying an entropy method based on RReliefF and MIC (EMBORAM) to PDF samples with 37 typical document vulnerabilities, which can effectively resist anti-detection methods, such as filling data and imitation attacks. Finally, we build a double-layer processing framework to detect samples efficiently through the AdaBoost-optimized random forest algorithm and the robustness-optimized support vector machine algorithm. Compared to the traditional static detection method, this model performs better for various evaluation criteria. The average time of document detection is 1.3 ms, while the accuracy rate reaches 95.9%. MDPI 2023-07-23 /pmc/articles/PMC10378245/ /pubmed/37510046 http://dx.doi.org/10.3390/e25071099 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Song, Enzhou Hu, Tao Yi, Peng Wang, Wenbo Double-Layer Detection Model of Malicious PDF Documents Based on Entropy Method with Multiple Features
title	Double-Layer Detection Model of Malicious PDF Documents Based on Entropy Method with Multiple Features
title_full	Double-Layer Detection Model of Malicious PDF Documents Based on Entropy Method with Multiple Features
title_fullStr	Double-Layer Detection Model of Malicious PDF Documents Based on Entropy Method with Multiple Features
title_full_unstemmed	Double-Layer Detection Model of Malicious PDF Documents Based on Entropy Method with Multiple Features
title_short	Double-Layer Detection Model of Malicious PDF Documents Based on Entropy Method with Multiple Features
title_sort	double-layer detection model of malicious pdf documents based on entropy method with multiple features
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10378245/ https://www.ncbi.nlm.nih.gov/pubmed/37510046 http://dx.doi.org/10.3390/e25071099
work_keys_str_mv	AT songenzhou doublelayerdetectionmodelofmaliciouspdfdocumentsbasedonentropymethodwithmultiplefeatures AT hutao doublelayerdetectionmodelofmaliciouspdfdocumentsbasedonentropymethodwithmultiplefeatures AT yipeng doublelayerdetectionmodelofmaliciouspdfdocumentsbasedonentropymethodwithmultiplefeatures AT wangwenbo doublelayerdetectionmodelofmaliciouspdfdocumentsbasedonentropymethodwithmultiplefeatures

Double-Layer Detection Model of Malicious PDF Documents Based on Entropy Method with Multiple Features

Ejemplares similares