Cargando…

Re_Trans: Combined Retrieval and Transformer Model for Source Code Summarization

Source code summarization (SCS) is a natural language description of source code functionality. It can help developers understand programs and maintain software efficiently. Retrieval-based methods generate SCS by reorganizing terms selected from source code or use SCS of similar code snippets. Gene...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Chunyan, Zhou, Qinglei, Qiao, Meng, Tang, Ke, Xu, Lianqiu, Liu, Fudong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9601825/
https://www.ncbi.nlm.nih.gov/pubmed/37420392
http://dx.doi.org/10.3390/e24101372
_version_ 1784817160772648960
author Zhang, Chunyan
Zhou, Qinglei
Qiao, Meng
Tang, Ke
Xu, Lianqiu
Liu, Fudong
author_facet Zhang, Chunyan
Zhou, Qinglei
Qiao, Meng
Tang, Ke
Xu, Lianqiu
Liu, Fudong
author_sort Zhang, Chunyan
collection PubMed
description Source code summarization (SCS) is a natural language description of source code functionality. It can help developers understand programs and maintain software efficiently. Retrieval-based methods generate SCS by reorganizing terms selected from source code or use SCS of similar code snippets. Generative methods generate SCS via attentional encoder–decoder architecture. However, a generative method can generate SCS for any code, but sometimes the accuracy is still far from expectation (due to the lack of numerous high-quality training sets). A retrieval-based method is considered to have a higher accurac, but usually fails to generate SCS for a source code in the absence of a similar candidate in the database. In order to effectively combine the advantages of retrieval-based methods and generative methods, we propose a new method: Re_Trans. For a given code, we first utilize the retrieval-based method to obtain its most similar code with regard to sematic and corresponding SCS (S_RM). Then, we input the given code and similar code into the trained discriminator. If the discriminator outputs onr, we take S_RM as the result; otherwise, we utilize the generate model, transformer, to generate the given code’ SCS. Particularly, we use AST-augmented (AbstractSyntax Tree) and code sequence-augmented information to make the source code semantic extraction more complete. Furthermore, we build a new SCS retrieval library through the public dataset. We evaluate our method on a dataset of 2.1 million Java code-comment pairs, and experimental results show improvement over the state-of-the-art (SOTA) benchmarks, which demonstrates the effectiveness and efficiency of our method.
format Online
Article
Text
id pubmed-9601825
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-96018252022-10-27 Re_Trans: Combined Retrieval and Transformer Model for Source Code Summarization Zhang, Chunyan Zhou, Qinglei Qiao, Meng Tang, Ke Xu, Lianqiu Liu, Fudong Entropy (Basel) Article Source code summarization (SCS) is a natural language description of source code functionality. It can help developers understand programs and maintain software efficiently. Retrieval-based methods generate SCS by reorganizing terms selected from source code or use SCS of similar code snippets. Generative methods generate SCS via attentional encoder–decoder architecture. However, a generative method can generate SCS for any code, but sometimes the accuracy is still far from expectation (due to the lack of numerous high-quality training sets). A retrieval-based method is considered to have a higher accurac, but usually fails to generate SCS for a source code in the absence of a similar candidate in the database. In order to effectively combine the advantages of retrieval-based methods and generative methods, we propose a new method: Re_Trans. For a given code, we first utilize the retrieval-based method to obtain its most similar code with regard to sematic and corresponding SCS (S_RM). Then, we input the given code and similar code into the trained discriminator. If the discriminator outputs onr, we take S_RM as the result; otherwise, we utilize the generate model, transformer, to generate the given code’ SCS. Particularly, we use AST-augmented (AbstractSyntax Tree) and code sequence-augmented information to make the source code semantic extraction more complete. Furthermore, we build a new SCS retrieval library through the public dataset. We evaluate our method on a dataset of 2.1 million Java code-comment pairs, and experimental results show improvement over the state-of-the-art (SOTA) benchmarks, which demonstrates the effectiveness and efficiency of our method. MDPI 2022-09-27 /pmc/articles/PMC9601825/ /pubmed/37420392 http://dx.doi.org/10.3390/e24101372 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhang, Chunyan
Zhou, Qinglei
Qiao, Meng
Tang, Ke
Xu, Lianqiu
Liu, Fudong
Re_Trans: Combined Retrieval and Transformer Model for Source Code Summarization
title Re_Trans: Combined Retrieval and Transformer Model for Source Code Summarization
title_full Re_Trans: Combined Retrieval and Transformer Model for Source Code Summarization
title_fullStr Re_Trans: Combined Retrieval and Transformer Model for Source Code Summarization
title_full_unstemmed Re_Trans: Combined Retrieval and Transformer Model for Source Code Summarization
title_short Re_Trans: Combined Retrieval and Transformer Model for Source Code Summarization
title_sort re_trans: combined retrieval and transformer model for source code summarization
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9601825/
https://www.ncbi.nlm.nih.gov/pubmed/37420392
http://dx.doi.org/10.3390/e24101372
work_keys_str_mv AT zhangchunyan retranscombinedretrievalandtransformermodelforsourcecodesummarization
AT zhouqinglei retranscombinedretrievalandtransformermodelforsourcecodesummarization
AT qiaomeng retranscombinedretrievalandtransformermodelforsourcecodesummarization
AT tangke retranscombinedretrievalandtransformermodelforsourcecodesummarization
AT xulianqiu retranscombinedretrievalandtransformermodelforsourcecodesummarization
AT liufudong retranscombinedretrievalandtransformermodelforsourcecodesummarization