Cargando…
IoTSim: Internet of Things-Oriented Binary Code Similarity Detection with Multiple Block Relations
Binary code similarity detection (BCSD) plays a crucial role in various computer security applications, including vulnerability detection, malware detection, and software component analysis. With the development of the Internet of Things (IoT), there are many binaries from different instruction arch...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10535887/ https://www.ncbi.nlm.nih.gov/pubmed/37765846 http://dx.doi.org/10.3390/s23187789 |
_version_ | 1785112736374456320 |
---|---|
author | Luo, Zhenhao Wang, Pengfei Xie, Wei Zhou, Xu Wang, Baosheng |
author_facet | Luo, Zhenhao Wang, Pengfei Xie, Wei Zhou, Xu Wang, Baosheng |
author_sort | Luo, Zhenhao |
collection | PubMed |
description | Binary code similarity detection (BCSD) plays a crucial role in various computer security applications, including vulnerability detection, malware detection, and software component analysis. With the development of the Internet of Things (IoT), there are many binaries from different instruction architecture sets, which require BCSD approaches robust against different architectures. In this study, we propose a novel IoT-oriented binary code similarity detection approach. Our approach leverages a customized transformer-based language model with disentangled attention to capture relative position information. To mitigate out-of-vocabulary (OOV) challenges in the language model, we introduce a base-token prediction pre-training task aimed at capturing basic semantics for unseen tokens. During function embedding generation, we integrate directed jumps, data dependency, and address adjacency to capture multiple block relations. We then assign different weights to different relations and use multi-layer Graph Convolutional Networks (GCN) to generate function embeddings. We implemented the prototype of IoTSim. Our experimental results show that our proposed block relation matrix improves IoTSim with large margins. With a pool size of [Formula: see text] , IoTSim achieves a recall@1 of 0.903 across architectures, outperforming the state-of-the-art approaches Trex, SAFE, and PalmTree. |
format | Online Article Text |
id | pubmed-10535887 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-105358872023-09-29 IoTSim: Internet of Things-Oriented Binary Code Similarity Detection with Multiple Block Relations Luo, Zhenhao Wang, Pengfei Xie, Wei Zhou, Xu Wang, Baosheng Sensors (Basel) Article Binary code similarity detection (BCSD) plays a crucial role in various computer security applications, including vulnerability detection, malware detection, and software component analysis. With the development of the Internet of Things (IoT), there are many binaries from different instruction architecture sets, which require BCSD approaches robust against different architectures. In this study, we propose a novel IoT-oriented binary code similarity detection approach. Our approach leverages a customized transformer-based language model with disentangled attention to capture relative position information. To mitigate out-of-vocabulary (OOV) challenges in the language model, we introduce a base-token prediction pre-training task aimed at capturing basic semantics for unseen tokens. During function embedding generation, we integrate directed jumps, data dependency, and address adjacency to capture multiple block relations. We then assign different weights to different relations and use multi-layer Graph Convolutional Networks (GCN) to generate function embeddings. We implemented the prototype of IoTSim. Our experimental results show that our proposed block relation matrix improves IoTSim with large margins. With a pool size of [Formula: see text] , IoTSim achieves a recall@1 of 0.903 across architectures, outperforming the state-of-the-art approaches Trex, SAFE, and PalmTree. MDPI 2023-09-11 /pmc/articles/PMC10535887/ /pubmed/37765846 http://dx.doi.org/10.3390/s23187789 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Luo, Zhenhao Wang, Pengfei Xie, Wei Zhou, Xu Wang, Baosheng IoTSim: Internet of Things-Oriented Binary Code Similarity Detection with Multiple Block Relations |
title | IoTSim: Internet of Things-Oriented Binary Code Similarity Detection with Multiple Block Relations |
title_full | IoTSim: Internet of Things-Oriented Binary Code Similarity Detection with Multiple Block Relations |
title_fullStr | IoTSim: Internet of Things-Oriented Binary Code Similarity Detection with Multiple Block Relations |
title_full_unstemmed | IoTSim: Internet of Things-Oriented Binary Code Similarity Detection with Multiple Block Relations |
title_short | IoTSim: Internet of Things-Oriented Binary Code Similarity Detection with Multiple Block Relations |
title_sort | iotsim: internet of things-oriented binary code similarity detection with multiple block relations |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10535887/ https://www.ncbi.nlm.nih.gov/pubmed/37765846 http://dx.doi.org/10.3390/s23187789 |
work_keys_str_mv | AT luozhenhao iotsiminternetofthingsorientedbinarycodesimilaritydetectionwithmultipleblockrelations AT wangpengfei iotsiminternetofthingsorientedbinarycodesimilaritydetectionwithmultipleblockrelations AT xiewei iotsiminternetofthingsorientedbinarycodesimilaritydetectionwithmultipleblockrelations AT zhouxu iotsiminternetofthingsorientedbinarycodesimilaritydetectionwithmultipleblockrelations AT wangbaosheng iotsiminternetofthingsorientedbinarycodesimilaritydetectionwithmultipleblockrelations |