Cargando…

TAP: A static analysis model for PHP vulnerabilities based on token and deep learning technology

With the widespread usage of Web applications, the security issues of source code are increasing. The exposed vulnerabilities seriously endanger the interests of service providers and customers. There are some models for solving this problem. However, most of them rely on complex graphs generated fr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Fang, Yong, Han, Shengjun, Huang, Cheng, Wu, Runpu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2019
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6860437/ https://www.ncbi.nlm.nih.gov/pubmed/31738786 http://dx.doi.org/10.1371/journal.pone.0225196

_version_	1783471238628245504
author	Fang, Yong Han, Shengjun Huang, Cheng Wu, Runpu
author_facet	Fang, Yong Han, Shengjun Huang, Cheng Wu, Runpu
author_sort	Fang, Yong
collection	PubMed
description	With the widespread usage of Web applications, the security issues of source code are increasing. The exposed vulnerabilities seriously endanger the interests of service providers and customers. There are some models for solving this problem. However, most of them rely on complex graphs generated from source code or regex patterns based on expert experience. In this paper, TAP, which is based on token mechanism and deep learning technology, was proposed as an analysis model to discover the vulnerabilities of PHP: Hypertext Preprocessor (PHP) Web programs conveniently and easily. Based on the token mechanism of PHP language, a custom tokenizer was designed, and it unifies tokens, supports some features of PHP and optimizes the parsing. Besides, the tokenizer also implements parameter iteration to achieve data flow analysis. On the Software Assurance Reference Dataset(SARD) and SQLI-LABS dataset, we trained the deep learning model of TAP by combining the word2vec model with Long Short-Term Memory (LSTM) network algorithm. According to the experiment on the dataset of CWE-89, TAP not only achieves the 0.9941 Area Under the Curve(AUC), which is better than other models, but also achieves the highest accuracy: 0.9787. Further, compared with RIPS, TAP shows much better in multiclass classification with 0.8319 Kappa and 0.0840 hamming distance.
format	Online Article Text
id	pubmed-6860437
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-68604372019-12-07 TAP: A static analysis model for PHP vulnerabilities based on token and deep learning technology Fang, Yong Han, Shengjun Huang, Cheng Wu, Runpu PLoS One Research Article With the widespread usage of Web applications, the security issues of source code are increasing. The exposed vulnerabilities seriously endanger the interests of service providers and customers. There are some models for solving this problem. However, most of them rely on complex graphs generated from source code or regex patterns based on expert experience. In this paper, TAP, which is based on token mechanism and deep learning technology, was proposed as an analysis model to discover the vulnerabilities of PHP: Hypertext Preprocessor (PHP) Web programs conveniently and easily. Based on the token mechanism of PHP language, a custom tokenizer was designed, and it unifies tokens, supports some features of PHP and optimizes the parsing. Besides, the tokenizer also implements parameter iteration to achieve data flow analysis. On the Software Assurance Reference Dataset(SARD) and SQLI-LABS dataset, we trained the deep learning model of TAP by combining the word2vec model with Long Short-Term Memory (LSTM) network algorithm. According to the experiment on the dataset of CWE-89, TAP not only achieves the 0.9941 Area Under the Curve(AUC), which is better than other models, but also achieves the highest accuracy: 0.9787. Further, compared with RIPS, TAP shows much better in multiclass classification with 0.8319 Kappa and 0.0840 hamming distance. Public Library of Science 2019-11-18 /pmc/articles/PMC6860437/ /pubmed/31738786 http://dx.doi.org/10.1371/journal.pone.0225196 Text en © 2019 Fang et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Fang, Yong Han, Shengjun Huang, Cheng Wu, Runpu TAP: A static analysis model for PHP vulnerabilities based on token and deep learning technology
title	TAP: A static analysis model for PHP vulnerabilities based on token and deep learning technology
title_full	TAP: A static analysis model for PHP vulnerabilities based on token and deep learning technology
title_fullStr	TAP: A static analysis model for PHP vulnerabilities based on token and deep learning technology
title_full_unstemmed	TAP: A static analysis model for PHP vulnerabilities based on token and deep learning technology
title_short	TAP: A static analysis model for PHP vulnerabilities based on token and deep learning technology
title_sort	tap: a static analysis model for php vulnerabilities based on token and deep learning technology
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6860437/ https://www.ncbi.nlm.nih.gov/pubmed/31738786 http://dx.doi.org/10.1371/journal.pone.0225196
work_keys_str_mv	AT fangyong tapastaticanalysismodelforphpvulnerabilitiesbasedontokenanddeeplearningtechnology AT hanshengjun tapastaticanalysismodelforphpvulnerabilitiesbasedontokenanddeeplearningtechnology AT huangcheng tapastaticanalysismodelforphpvulnerabilitiesbasedontokenanddeeplearningtechnology AT wurunpu tapastaticanalysismodelforphpvulnerabilitiesbasedontokenanddeeplearningtechnology

TAP: A static analysis model for PHP vulnerabilities based on token and deep learning technology

Ejemplares similares