Cargando…

Authorship attribution of source code by using back propagation neural network based on particle swarm optimization

Authorship attribution is to identify the most likely author of a given sample among a set of candidate known authors. It can be not only applied to discover the original author of plain text, such as novels, blogs, emails, posts etc., but also used to identify source code programmers. Authorship at...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Xinyu, Xu, Guoai, Li, Qi, Guo, Yanhui, Zhang, Miao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5667828/
https://www.ncbi.nlm.nih.gov/pubmed/29095934
http://dx.doi.org/10.1371/journal.pone.0187204
_version_ 1783275560604008448
author Yang, Xinyu
Xu, Guoai
Li, Qi
Guo, Yanhui
Zhang, Miao
author_facet Yang, Xinyu
Xu, Guoai
Li, Qi
Guo, Yanhui
Zhang, Miao
author_sort Yang, Xinyu
collection PubMed
description Authorship attribution is to identify the most likely author of a given sample among a set of candidate known authors. It can be not only applied to discover the original author of plain text, such as novels, blogs, emails, posts etc., but also used to identify source code programmers. Authorship attribution of source code is required in diverse applications, ranging from malicious code tracking to solving authorship dispute or software plagiarism detection. This paper aims to propose a new method to identify the programmer of Java source code samples with a higher accuracy. To this end, it first introduces back propagation (BP) neural network based on particle swarm optimization (PSO) into authorship attribution of source code. It begins by computing a set of defined feature metrics, including lexical and layout metrics, structure and syntax metrics, totally 19 dimensions. Then these metrics are input to neural network for supervised learning, the weights of which are output by PSO and BP hybrid algorithm. The effectiveness of the proposed method is evaluated on a collected dataset with 3,022 Java files belong to 40 authors. Experiment results show that the proposed method achieves 91.060% accuracy. And a comparison with previous work on authorship attribution of source code for Java language illustrates that this proposed method outperforms others overall, also with an acceptable overhead.
format Online
Article
Text
id pubmed-5667828
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-56678282017-11-17 Authorship attribution of source code by using back propagation neural network based on particle swarm optimization Yang, Xinyu Xu, Guoai Li, Qi Guo, Yanhui Zhang, Miao PLoS One Research Article Authorship attribution is to identify the most likely author of a given sample among a set of candidate known authors. It can be not only applied to discover the original author of plain text, such as novels, blogs, emails, posts etc., but also used to identify source code programmers. Authorship attribution of source code is required in diverse applications, ranging from malicious code tracking to solving authorship dispute or software plagiarism detection. This paper aims to propose a new method to identify the programmer of Java source code samples with a higher accuracy. To this end, it first introduces back propagation (BP) neural network based on particle swarm optimization (PSO) into authorship attribution of source code. It begins by computing a set of defined feature metrics, including lexical and layout metrics, structure and syntax metrics, totally 19 dimensions. Then these metrics are input to neural network for supervised learning, the weights of which are output by PSO and BP hybrid algorithm. The effectiveness of the proposed method is evaluated on a collected dataset with 3,022 Java files belong to 40 authors. Experiment results show that the proposed method achieves 91.060% accuracy. And a comparison with previous work on authorship attribution of source code for Java language illustrates that this proposed method outperforms others overall, also with an acceptable overhead. Public Library of Science 2017-11-02 /pmc/articles/PMC5667828/ /pubmed/29095934 http://dx.doi.org/10.1371/journal.pone.0187204 Text en © 2017 Yang et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Yang, Xinyu
Xu, Guoai
Li, Qi
Guo, Yanhui
Zhang, Miao
Authorship attribution of source code by using back propagation neural network based on particle swarm optimization
title Authorship attribution of source code by using back propagation neural network based on particle swarm optimization
title_full Authorship attribution of source code by using back propagation neural network based on particle swarm optimization
title_fullStr Authorship attribution of source code by using back propagation neural network based on particle swarm optimization
title_full_unstemmed Authorship attribution of source code by using back propagation neural network based on particle swarm optimization
title_short Authorship attribution of source code by using back propagation neural network based on particle swarm optimization
title_sort authorship attribution of source code by using back propagation neural network based on particle swarm optimization
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5667828/
https://www.ncbi.nlm.nih.gov/pubmed/29095934
http://dx.doi.org/10.1371/journal.pone.0187204
work_keys_str_mv AT yangxinyu authorshipattributionofsourcecodebyusingbackpropagationneuralnetworkbasedonparticleswarmoptimization
AT xuguoai authorshipattributionofsourcecodebyusingbackpropagationneuralnetworkbasedonparticleswarmoptimization
AT liqi authorshipattributionofsourcecodebyusingbackpropagationneuralnetworkbasedonparticleswarmoptimization
AT guoyanhui authorshipattributionofsourcecodebyusingbackpropagationneuralnetworkbasedonparticleswarmoptimization
AT zhangmiao authorshipattributionofsourcecodebyusingbackpropagationneuralnetworkbasedonparticleswarmoptimization