Cargando…
Malicious source code detection using a translation model
Modern software development often relies on open-source code sharing. Open-source code reuse, however, allows hackers to access wide developer communities, thereby potentially affecting many products. An increasing number of such “supply chain attacks” have occurred in recent years, taking advantage...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10382987/ https://www.ncbi.nlm.nih.gov/pubmed/37521045 http://dx.doi.org/10.1016/j.patter.2023.100773 |
_version_ | 1785080795821506560 |
---|---|
author | Tsfaty, Chen Fire, Michael |
author_facet | Tsfaty, Chen Fire, Michael |
author_sort | Tsfaty, Chen |
collection | PubMed |
description | Modern software development often relies on open-source code sharing. Open-source code reuse, however, allows hackers to access wide developer communities, thereby potentially affecting many products. An increasing number of such “supply chain attacks” have occurred in recent years, taking advantage of open-source software development practices. Here, we introduce the Malicious Source code Detection using a Translation model (MSDT) algorithm. MSDT is a novel deep-learning-based analysis method that detects real-world code injections into source code packages. We have tested MSDT by embedding examples from a dataset of over 600,000 different functions and then applying a clustering algorithm to the resulting embedding vectors to identify malicious functions by detecting outliers. We evaluated MSDT’s performance with extensive experiments and demonstrated that MSDT could detect malicious code injections with precision@k values of up to 0.909. |
format | Online Article Text |
id | pubmed-10382987 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-103829872023-07-30 Malicious source code detection using a translation model Tsfaty, Chen Fire, Michael Patterns (N Y) Article Modern software development often relies on open-source code sharing. Open-source code reuse, however, allows hackers to access wide developer communities, thereby potentially affecting many products. An increasing number of such “supply chain attacks” have occurred in recent years, taking advantage of open-source software development practices. Here, we introduce the Malicious Source code Detection using a Translation model (MSDT) algorithm. MSDT is a novel deep-learning-based analysis method that detects real-world code injections into source code packages. We have tested MSDT by embedding examples from a dataset of over 600,000 different functions and then applying a clustering algorithm to the resulting embedding vectors to identify malicious functions by detecting outliers. We evaluated MSDT’s performance with extensive experiments and demonstrated that MSDT could detect malicious code injections with precision@k values of up to 0.909. Elsevier 2023-06-06 /pmc/articles/PMC10382987/ /pubmed/37521045 http://dx.doi.org/10.1016/j.patter.2023.100773 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Tsfaty, Chen Fire, Michael Malicious source code detection using a translation model |
title | Malicious source code detection using a translation model |
title_full | Malicious source code detection using a translation model |
title_fullStr | Malicious source code detection using a translation model |
title_full_unstemmed | Malicious source code detection using a translation model |
title_short | Malicious source code detection using a translation model |
title_sort | malicious source code detection using a translation model |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10382987/ https://www.ncbi.nlm.nih.gov/pubmed/37521045 http://dx.doi.org/10.1016/j.patter.2023.100773 |
work_keys_str_mv | AT tsfatychen malicioussourcecodedetectionusingatranslationmodel AT firemichael malicioussourcecodedetectionusingatranslationmodel |