Cargando…
Structure and Sequence Aligned Code Summarization with Prefix and Suffix Balanced Strategy
Source code summarization focuses on generating qualified natural language descriptions of a code snippet (e.g., functionality, usage and version). In an actual development environment, descriptions of the code are missing or not consistent with the code due to human factors, which makes it difficul...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10138082/ https://www.ncbi.nlm.nih.gov/pubmed/37190358 http://dx.doi.org/10.3390/e25040570 |
_version_ | 1785032622358921216 |
---|---|
author | Zeng, Jianhui Qu, Zhiheng Cai, Bo |
author_facet | Zeng, Jianhui Qu, Zhiheng Cai, Bo |
author_sort | Zeng, Jianhui |
collection | PubMed |
description | Source code summarization focuses on generating qualified natural language descriptions of a code snippet (e.g., functionality, usage and version). In an actual development environment, descriptions of the code are missing or not consistent with the code due to human factors, which makes it difficult for developers to comprehend and conduct subsequent maintenance. Some existing methods generate summaries from the sequence information of code without considering the structural information. Recently, researchers have adopted the Graph Neural Networks (GNNs) to capture the structural information with modified Abstract Syntax Trees (ASTs) to comprehensively represent a source code, but the alignment method of the two information encoder is hard to decide. In this paper, we propose a source code summarization model named SSCS, a unified transformer-based encoder–decoder architecture, for capturing structural and sequence information. SSCS is designed upon a structure-induced transformer with three main novel improvements. SSCS captures the structural information in a multi-scale aspect with an adapted fusion strategy and adopts a hierarchical encoding strategy to capture the textual information from the perspective of the document. Moreover, SSCS utilizes a bidirectional decoder which generates a summary from opposite direction to balance the generation performance between prefix and suffix. We conduct experiments on two public Java and Python datasets to evaluate our method and the result show that SSCS outperforms the state-of-art code summarization methods. |
format | Online Article Text |
id | pubmed-10138082 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-101380822023-04-28 Structure and Sequence Aligned Code Summarization with Prefix and Suffix Balanced Strategy Zeng, Jianhui Qu, Zhiheng Cai, Bo Entropy (Basel) Article Source code summarization focuses on generating qualified natural language descriptions of a code snippet (e.g., functionality, usage and version). In an actual development environment, descriptions of the code are missing or not consistent with the code due to human factors, which makes it difficult for developers to comprehend and conduct subsequent maintenance. Some existing methods generate summaries from the sequence information of code without considering the structural information. Recently, researchers have adopted the Graph Neural Networks (GNNs) to capture the structural information with modified Abstract Syntax Trees (ASTs) to comprehensively represent a source code, but the alignment method of the two information encoder is hard to decide. In this paper, we propose a source code summarization model named SSCS, a unified transformer-based encoder–decoder architecture, for capturing structural and sequence information. SSCS is designed upon a structure-induced transformer with three main novel improvements. SSCS captures the structural information in a multi-scale aspect with an adapted fusion strategy and adopts a hierarchical encoding strategy to capture the textual information from the perspective of the document. Moreover, SSCS utilizes a bidirectional decoder which generates a summary from opposite direction to balance the generation performance between prefix and suffix. We conduct experiments on two public Java and Python datasets to evaluate our method and the result show that SSCS outperforms the state-of-art code summarization methods. MDPI 2023-03-26 /pmc/articles/PMC10138082/ /pubmed/37190358 http://dx.doi.org/10.3390/e25040570 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Zeng, Jianhui Qu, Zhiheng Cai, Bo Structure and Sequence Aligned Code Summarization with Prefix and Suffix Balanced Strategy |
title | Structure and Sequence Aligned Code Summarization with Prefix and Suffix Balanced Strategy |
title_full | Structure and Sequence Aligned Code Summarization with Prefix and Suffix Balanced Strategy |
title_fullStr | Structure and Sequence Aligned Code Summarization with Prefix and Suffix Balanced Strategy |
title_full_unstemmed | Structure and Sequence Aligned Code Summarization with Prefix and Suffix Balanced Strategy |
title_short | Structure and Sequence Aligned Code Summarization with Prefix and Suffix Balanced Strategy |
title_sort | structure and sequence aligned code summarization with prefix and suffix balanced strategy |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10138082/ https://www.ncbi.nlm.nih.gov/pubmed/37190358 http://dx.doi.org/10.3390/e25040570 |
work_keys_str_mv | AT zengjianhui structureandsequencealignedcodesummarizationwithprefixandsuffixbalancedstrategy AT quzhiheng structureandsequencealignedcodesummarizationwithprefixandsuffixbalancedstrategy AT caibo structureandsequencealignedcodesummarizationwithprefixandsuffixbalancedstrategy |