Cargando…

Structure and Sequence Aligned Code Summarization with Prefix and Suffix Balanced Strategy

Source code summarization focuses on generating qualified natural language descriptions of a code snippet (e.g., functionality, usage and version). In an actual development environment, descriptions of the code are missing or not consistent with the code due to human factors, which makes it difficul...

Descripción completa

Detalles Bibliográficos
Autores principales: Zeng, Jianhui, Qu, Zhiheng, Cai, Bo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10138082/
https://www.ncbi.nlm.nih.gov/pubmed/37190358
http://dx.doi.org/10.3390/e25040570
_version_ 1785032622358921216
author Zeng, Jianhui
Qu, Zhiheng
Cai, Bo
author_facet Zeng, Jianhui
Qu, Zhiheng
Cai, Bo
author_sort Zeng, Jianhui
collection PubMed
description Source code summarization focuses on generating qualified natural language descriptions of a code snippet (e.g., functionality, usage and version). In an actual development environment, descriptions of the code are missing or not consistent with the code due to human factors, which makes it difficult for developers to comprehend and conduct subsequent maintenance. Some existing methods generate summaries from the sequence information of code without considering the structural information. Recently, researchers have adopted the Graph Neural Networks (GNNs) to capture the structural information with modified Abstract Syntax Trees (ASTs) to comprehensively represent a source code, but the alignment method of the two information encoder is hard to decide. In this paper, we propose a source code summarization model named SSCS, a unified transformer-based encoder–decoder architecture, for capturing structural and sequence information. SSCS is designed upon a structure-induced transformer with three main novel improvements. SSCS captures the structural information in a multi-scale aspect with an adapted fusion strategy and adopts a hierarchical encoding strategy to capture the textual information from the perspective of the document. Moreover, SSCS utilizes a bidirectional decoder which generates a summary from opposite direction to balance the generation performance between prefix and suffix. We conduct experiments on two public Java and Python datasets to evaluate our method and the result show that SSCS outperforms the state-of-art code summarization methods.
format Online
Article
Text
id pubmed-10138082
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-101380822023-04-28 Structure and Sequence Aligned Code Summarization with Prefix and Suffix Balanced Strategy Zeng, Jianhui Qu, Zhiheng Cai, Bo Entropy (Basel) Article Source code summarization focuses on generating qualified natural language descriptions of a code snippet (e.g., functionality, usage and version). In an actual development environment, descriptions of the code are missing or not consistent with the code due to human factors, which makes it difficult for developers to comprehend and conduct subsequent maintenance. Some existing methods generate summaries from the sequence information of code without considering the structural information. Recently, researchers have adopted the Graph Neural Networks (GNNs) to capture the structural information with modified Abstract Syntax Trees (ASTs) to comprehensively represent a source code, but the alignment method of the two information encoder is hard to decide. In this paper, we propose a source code summarization model named SSCS, a unified transformer-based encoder–decoder architecture, for capturing structural and sequence information. SSCS is designed upon a structure-induced transformer with three main novel improvements. SSCS captures the structural information in a multi-scale aspect with an adapted fusion strategy and adopts a hierarchical encoding strategy to capture the textual information from the perspective of the document. Moreover, SSCS utilizes a bidirectional decoder which generates a summary from opposite direction to balance the generation performance between prefix and suffix. We conduct experiments on two public Java and Python datasets to evaluate our method and the result show that SSCS outperforms the state-of-art code summarization methods. MDPI 2023-03-26 /pmc/articles/PMC10138082/ /pubmed/37190358 http://dx.doi.org/10.3390/e25040570 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zeng, Jianhui
Qu, Zhiheng
Cai, Bo
Structure and Sequence Aligned Code Summarization with Prefix and Suffix Balanced Strategy
title Structure and Sequence Aligned Code Summarization with Prefix and Suffix Balanced Strategy
title_full Structure and Sequence Aligned Code Summarization with Prefix and Suffix Balanced Strategy
title_fullStr Structure and Sequence Aligned Code Summarization with Prefix and Suffix Balanced Strategy
title_full_unstemmed Structure and Sequence Aligned Code Summarization with Prefix and Suffix Balanced Strategy
title_short Structure and Sequence Aligned Code Summarization with Prefix and Suffix Balanced Strategy
title_sort structure and sequence aligned code summarization with prefix and suffix balanced strategy
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10138082/
https://www.ncbi.nlm.nih.gov/pubmed/37190358
http://dx.doi.org/10.3390/e25040570
work_keys_str_mv AT zengjianhui structureandsequencealignedcodesummarizationwithprefixandsuffixbalancedstrategy
AT quzhiheng structureandsequencealignedcodesummarizationwithprefixandsuffixbalancedstrategy
AT caibo structureandsequencealignedcodesummarizationwithprefixandsuffixbalancedstrategy