Cargando…

IoTSim: Internet of Things-Oriented Binary Code Similarity Detection with Multiple Block Relations

Binary code similarity detection (BCSD) plays a crucial role in various computer security applications, including vulnerability detection, malware detection, and software component analysis. With the development of the Internet of Things (IoT), there are many binaries from different instruction arch...

Descripción completa

Detalles Bibliográficos
Autores principales: Luo, Zhenhao, Wang, Pengfei, Xie, Wei, Zhou, Xu, Wang, Baosheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10535887/
https://www.ncbi.nlm.nih.gov/pubmed/37765846
http://dx.doi.org/10.3390/s23187789
_version_ 1785112736374456320
author Luo, Zhenhao
Wang, Pengfei
Xie, Wei
Zhou, Xu
Wang, Baosheng
author_facet Luo, Zhenhao
Wang, Pengfei
Xie, Wei
Zhou, Xu
Wang, Baosheng
author_sort Luo, Zhenhao
collection PubMed
description Binary code similarity detection (BCSD) plays a crucial role in various computer security applications, including vulnerability detection, malware detection, and software component analysis. With the development of the Internet of Things (IoT), there are many binaries from different instruction architecture sets, which require BCSD approaches robust against different architectures. In this study, we propose a novel IoT-oriented binary code similarity detection approach. Our approach leverages a customized transformer-based language model with disentangled attention to capture relative position information. To mitigate out-of-vocabulary (OOV) challenges in the language model, we introduce a base-token prediction pre-training task aimed at capturing basic semantics for unseen tokens. During function embedding generation, we integrate directed jumps, data dependency, and address adjacency to capture multiple block relations. We then assign different weights to different relations and use multi-layer Graph Convolutional Networks (GCN) to generate function embeddings. We implemented the prototype of IoTSim. Our experimental results show that our proposed block relation matrix improves IoTSim with large margins. With a pool size of [Formula: see text] , IoTSim achieves a recall@1 of 0.903 across architectures, outperforming the state-of-the-art approaches Trex, SAFE, and PalmTree.
format Online
Article
Text
id pubmed-10535887
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-105358872023-09-29 IoTSim: Internet of Things-Oriented Binary Code Similarity Detection with Multiple Block Relations Luo, Zhenhao Wang, Pengfei Xie, Wei Zhou, Xu Wang, Baosheng Sensors (Basel) Article Binary code similarity detection (BCSD) plays a crucial role in various computer security applications, including vulnerability detection, malware detection, and software component analysis. With the development of the Internet of Things (IoT), there are many binaries from different instruction architecture sets, which require BCSD approaches robust against different architectures. In this study, we propose a novel IoT-oriented binary code similarity detection approach. Our approach leverages a customized transformer-based language model with disentangled attention to capture relative position information. To mitigate out-of-vocabulary (OOV) challenges in the language model, we introduce a base-token prediction pre-training task aimed at capturing basic semantics for unseen tokens. During function embedding generation, we integrate directed jumps, data dependency, and address adjacency to capture multiple block relations. We then assign different weights to different relations and use multi-layer Graph Convolutional Networks (GCN) to generate function embeddings. We implemented the prototype of IoTSim. Our experimental results show that our proposed block relation matrix improves IoTSim with large margins. With a pool size of [Formula: see text] , IoTSim achieves a recall@1 of 0.903 across architectures, outperforming the state-of-the-art approaches Trex, SAFE, and PalmTree. MDPI 2023-09-11 /pmc/articles/PMC10535887/ /pubmed/37765846 http://dx.doi.org/10.3390/s23187789 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Luo, Zhenhao
Wang, Pengfei
Xie, Wei
Zhou, Xu
Wang, Baosheng
IoTSim: Internet of Things-Oriented Binary Code Similarity Detection with Multiple Block Relations
title IoTSim: Internet of Things-Oriented Binary Code Similarity Detection with Multiple Block Relations
title_full IoTSim: Internet of Things-Oriented Binary Code Similarity Detection with Multiple Block Relations
title_fullStr IoTSim: Internet of Things-Oriented Binary Code Similarity Detection with Multiple Block Relations
title_full_unstemmed IoTSim: Internet of Things-Oriented Binary Code Similarity Detection with Multiple Block Relations
title_short IoTSim: Internet of Things-Oriented Binary Code Similarity Detection with Multiple Block Relations
title_sort iotsim: internet of things-oriented binary code similarity detection with multiple block relations
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10535887/
https://www.ncbi.nlm.nih.gov/pubmed/37765846
http://dx.doi.org/10.3390/s23187789
work_keys_str_mv AT luozhenhao iotsiminternetofthingsorientedbinarycodesimilaritydetectionwithmultipleblockrelations
AT wangpengfei iotsiminternetofthingsorientedbinarycodesimilaritydetectionwithmultipleblockrelations
AT xiewei iotsiminternetofthingsorientedbinarycodesimilaritydetectionwithmultipleblockrelations
AT zhouxu iotsiminternetofthingsorientedbinarycodesimilaritydetectionwithmultipleblockrelations
AT wangbaosheng iotsiminternetofthingsorientedbinarycodesimilaritydetectionwithmultipleblockrelations