Cargando…

Malicious source code detection using a translation model

Modern software development often relies on open-source code sharing. Open-source code reuse, however, allows hackers to access wide developer communities, thereby potentially affecting many products. An increasing number of such “supply chain attacks” have occurred in recent years, taking advantage...

Descripción completa

Detalles Bibliográficos
Autores principales: Tsfaty, Chen, Fire, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10382987/
https://www.ncbi.nlm.nih.gov/pubmed/37521045
http://dx.doi.org/10.1016/j.patter.2023.100773
_version_ 1785080795821506560
author Tsfaty, Chen
Fire, Michael
author_facet Tsfaty, Chen
Fire, Michael
author_sort Tsfaty, Chen
collection PubMed
description Modern software development often relies on open-source code sharing. Open-source code reuse, however, allows hackers to access wide developer communities, thereby potentially affecting many products. An increasing number of such “supply chain attacks” have occurred in recent years, taking advantage of open-source software development practices. Here, we introduce the Malicious Source code Detection using a Translation model (MSDT) algorithm. MSDT is a novel deep-learning-based analysis method that detects real-world code injections into source code packages. We have tested MSDT by embedding examples from a dataset of over 600,000 different functions and then applying a clustering algorithm to the resulting embedding vectors to identify malicious functions by detecting outliers. We evaluated MSDT’s performance with extensive experiments and demonstrated that MSDT could detect malicious code injections with precision@k values of up to 0.909.
format Online
Article
Text
id pubmed-10382987
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-103829872023-07-30 Malicious source code detection using a translation model Tsfaty, Chen Fire, Michael Patterns (N Y) Article Modern software development often relies on open-source code sharing. Open-source code reuse, however, allows hackers to access wide developer communities, thereby potentially affecting many products. An increasing number of such “supply chain attacks” have occurred in recent years, taking advantage of open-source software development practices. Here, we introduce the Malicious Source code Detection using a Translation model (MSDT) algorithm. MSDT is a novel deep-learning-based analysis method that detects real-world code injections into source code packages. We have tested MSDT by embedding examples from a dataset of over 600,000 different functions and then applying a clustering algorithm to the resulting embedding vectors to identify malicious functions by detecting outliers. We evaluated MSDT’s performance with extensive experiments and demonstrated that MSDT could detect malicious code injections with precision@k values of up to 0.909. Elsevier 2023-06-06 /pmc/articles/PMC10382987/ /pubmed/37521045 http://dx.doi.org/10.1016/j.patter.2023.100773 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Tsfaty, Chen
Fire, Michael
Malicious source code detection using a translation model
title Malicious source code detection using a translation model
title_full Malicious source code detection using a translation model
title_fullStr Malicious source code detection using a translation model
title_full_unstemmed Malicious source code detection using a translation model
title_short Malicious source code detection using a translation model
title_sort malicious source code detection using a translation model
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10382987/
https://www.ncbi.nlm.nih.gov/pubmed/37521045
http://dx.doi.org/10.1016/j.patter.2023.100773
work_keys_str_mv AT tsfatychen malicioussourcecodedetectionusingatranslationmodel
AT firemichael malicioussourcecodedetectionusingatranslationmodel