Cargando…

Application of Seq2Seq Models on Code Correction

We apply various seq2seq models on programming language correction tasks on Juliet Test Suite for C/C++ and Java of Software Assurance Reference Datasets and achieve 75% (for C/C++) and 56% (for Java) repair rates on these tasks. We introduce pyramid encoder in these seq2seq models, which significan...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Shan, Zhou, Xiao, Chin, Sang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8017285/
https://www.ncbi.nlm.nih.gov/pubmed/33817628
http://dx.doi.org/10.3389/frai.2021.590215
_version_ 1783674032827138048
author Huang, Shan
Zhou, Xiao
Chin, Sang
author_facet Huang, Shan
Zhou, Xiao
Chin, Sang
author_sort Huang, Shan
collection PubMed
description We apply various seq2seq models on programming language correction tasks on Juliet Test Suite for C/C++ and Java of Software Assurance Reference Datasets and achieve 75% (for C/C++) and 56% (for Java) repair rates on these tasks. We introduce pyramid encoder in these seq2seq models, which significantly increases the computational efficiency and memory efficiency, while achieving similar repair rate to their nonpyramid counterparts. We successfully carry out error type classification task on ITC benchmark examples (with only 685 code instances) using transfer learning with models pretrained on Juliet Test Suite, pointing out a novel way of processing small programming language datasets.
format Online
Article
Text
id pubmed-8017285
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-80172852021-04-03 Application of Seq2Seq Models on Code Correction Huang, Shan Zhou, Xiao Chin, Sang Front Artif Intell Artificial Intelligence We apply various seq2seq models on programming language correction tasks on Juliet Test Suite for C/C++ and Java of Software Assurance Reference Datasets and achieve 75% (for C/C++) and 56% (for Java) repair rates on these tasks. We introduce pyramid encoder in these seq2seq models, which significantly increases the computational efficiency and memory efficiency, while achieving similar repair rate to their nonpyramid counterparts. We successfully carry out error type classification task on ITC benchmark examples (with only 685 code instances) using transfer learning with models pretrained on Juliet Test Suite, pointing out a novel way of processing small programming language datasets. Frontiers Media S.A. 2021-03-19 /pmc/articles/PMC8017285/ /pubmed/33817628 http://dx.doi.org/10.3389/frai.2021.590215 Text en Copyright © 2021 Huang, Zhou and Chin. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Artificial Intelligence
Huang, Shan
Zhou, Xiao
Chin, Sang
Application of Seq2Seq Models on Code Correction
title Application of Seq2Seq Models on Code Correction
title_full Application of Seq2Seq Models on Code Correction
title_fullStr Application of Seq2Seq Models on Code Correction
title_full_unstemmed Application of Seq2Seq Models on Code Correction
title_short Application of Seq2Seq Models on Code Correction
title_sort application of seq2seq models on code correction
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8017285/
https://www.ncbi.nlm.nih.gov/pubmed/33817628
http://dx.doi.org/10.3389/frai.2021.590215
work_keys_str_mv AT huangshan applicationofseq2seqmodelsoncodecorrection
AT zhouxiao applicationofseq2seqmodelsoncodecorrection
AT chinsang applicationofseq2seqmodelsoncodecorrection