Cargando…

Implementing a high-efficiency similarity analysis approach for firmware code

The rapid expansion of the open-source community has shortened the software development cycle, but the spread of vulnerabilities has been accelerated, especially in the field of the Internet of Things. In recent years, the frequency of attacks against connected devices is increasing exponentially; t...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Yisen, Wang, Ruimin, Jing, Jing, Wang, Huanwei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7802928/
https://www.ncbi.nlm.nih.gov/pubmed/33434199
http://dx.doi.org/10.1371/journal.pone.0245098
_version_ 1783635841636106240
author Wang, Yisen
Wang, Ruimin
Jing, Jing
Wang, Huanwei
author_facet Wang, Yisen
Wang, Ruimin
Jing, Jing
Wang, Huanwei
author_sort Wang, Yisen
collection PubMed
description The rapid expansion of the open-source community has shortened the software development cycle, but the spread of vulnerabilities has been accelerated, especially in the field of the Internet of Things. In recent years, the frequency of attacks against connected devices is increasing exponentially; thus, the vulnerabilities are more serious in nature. The state-of-the-art firmware security inspection technologies, such as methods based on machine learning and graph theory, find similar applications depending on the known vulnerabilities but cannot do anything without detailed information about the vulnerabilities. Moreover, model training, which is necessary for the machine learning technologies, requires a significant amount of time and data, resulting in low efficiency and poor extensibility. Aiming at the above shortcomings, a high-efficiency similarity analysis approach for firmware code is proposed in this study. First, the function control flow features and data flow features are extracted from the functions of the firmware and of the vulnerabilities, and the features are used to calculate the SimHash of the functions. The mass storage and fast query capabilities of the SimHash are implemented by the pigeonhole principle. Second, the similarity function pairs are analyzed in detail within and among the basic blocks. Within the basic blocks, the symbolic execution is used to generate the basic block semantic information, and the constraint solver is used to determine the semantic equivalence. Among the basic blocks, the local control flow graphs are analyzed to obtain their similarity. Then, we implemented a prototype and present the evaluation. The evaluation results demonstrate that the proposed approach can implement large-scale firmware function similarity analysis. It can also get the location of the real-world firmware patch without vulnerability function information. Finally, we compare our method with existing methods. The comparison results demonstrate that our method is more efficient and accurate than the Gemini and StagedMethod. More than 90% of the firmware functions can be indexed within 0.1 s, while the search time of 100,000 firmware functions is less than 2 s.
format Online
Article
Text
id pubmed-7802928
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-78029282021-01-22 Implementing a high-efficiency similarity analysis approach for firmware code Wang, Yisen Wang, Ruimin Jing, Jing Wang, Huanwei PLoS One Research Article The rapid expansion of the open-source community has shortened the software development cycle, but the spread of vulnerabilities has been accelerated, especially in the field of the Internet of Things. In recent years, the frequency of attacks against connected devices is increasing exponentially; thus, the vulnerabilities are more serious in nature. The state-of-the-art firmware security inspection technologies, such as methods based on machine learning and graph theory, find similar applications depending on the known vulnerabilities but cannot do anything without detailed information about the vulnerabilities. Moreover, model training, which is necessary for the machine learning technologies, requires a significant amount of time and data, resulting in low efficiency and poor extensibility. Aiming at the above shortcomings, a high-efficiency similarity analysis approach for firmware code is proposed in this study. First, the function control flow features and data flow features are extracted from the functions of the firmware and of the vulnerabilities, and the features are used to calculate the SimHash of the functions. The mass storage and fast query capabilities of the SimHash are implemented by the pigeonhole principle. Second, the similarity function pairs are analyzed in detail within and among the basic blocks. Within the basic blocks, the symbolic execution is used to generate the basic block semantic information, and the constraint solver is used to determine the semantic equivalence. Among the basic blocks, the local control flow graphs are analyzed to obtain their similarity. Then, we implemented a prototype and present the evaluation. The evaluation results demonstrate that the proposed approach can implement large-scale firmware function similarity analysis. It can also get the location of the real-world firmware patch without vulnerability function information. Finally, we compare our method with existing methods. The comparison results demonstrate that our method is more efficient and accurate than the Gemini and StagedMethod. More than 90% of the firmware functions can be indexed within 0.1 s, while the search time of 100,000 firmware functions is less than 2 s. Public Library of Science 2021-01-12 /pmc/articles/PMC7802928/ /pubmed/33434199 http://dx.doi.org/10.1371/journal.pone.0245098 Text en © 2021 Wang et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Wang, Yisen
Wang, Ruimin
Jing, Jing
Wang, Huanwei
Implementing a high-efficiency similarity analysis approach for firmware code
title Implementing a high-efficiency similarity analysis approach for firmware code
title_full Implementing a high-efficiency similarity analysis approach for firmware code
title_fullStr Implementing a high-efficiency similarity analysis approach for firmware code
title_full_unstemmed Implementing a high-efficiency similarity analysis approach for firmware code
title_short Implementing a high-efficiency similarity analysis approach for firmware code
title_sort implementing a high-efficiency similarity analysis approach for firmware code
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7802928/
https://www.ncbi.nlm.nih.gov/pubmed/33434199
http://dx.doi.org/10.1371/journal.pone.0245098
work_keys_str_mv AT wangyisen implementingahighefficiencysimilarityanalysisapproachforfirmwarecode
AT wangruimin implementingahighefficiencysimilarityanalysisapproachforfirmwarecode
AT jingjing implementingahighefficiencysimilarityanalysisapproachforfirmwarecode
AT wanghuanwei implementingahighefficiencysimilarityanalysisapproachforfirmwarecode