Cargando…

FindICI: Using machine learning to detect linguistic inconsistencies between code and natural language descriptions in infrastructure-as-code

Linguistic anti-patterns are recurring poor practices concerning inconsistencies in the naming, documentation, and implementation of an entity. They impede the readability, understandability, and maintainability of source code. This paper attempts to detect linguistic anti-patterns in Infrastructure...

Descripción completa

Detalles Bibliográficos
Autores principales: Borovits, Nemania, Kumara, Indika, Di Nucci, Dario, Krishnan, Parvathy, Palma, Stefano Dalla, Palomba, Fabio, Tamburri, Damian A., Heuvel, Willem-Jan van den
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9489593/
https://www.ncbi.nlm.nih.gov/pubmed/36159893
http://dx.doi.org/10.1007/s10664-022-10215-5
Descripción
Sumario:Linguistic anti-patterns are recurring poor practices concerning inconsistencies in the naming, documentation, and implementation of an entity. They impede the readability, understandability, and maintainability of source code. This paper attempts to detect linguistic anti-patterns in Infrastructure-as-Code (IaC) scripts used to provision and manage computing environments. In particular, we consider inconsistencies between the logic/body of IaC code units and their short text names. To this end, we propose FindICI a novel automated approach that employs word embedding and classification algorithms. We build and use the abstract syntax tree of IaC code units to create code embeddings used by machine learning techniques to detect inconsistent IaC code units. We evaluated our approach with two experiments on Ansible tasks systematically extracted from open source repositories for various word embedding models and classification algorithms. Classical machine learning models and novel deep learning models with different word embedding methods showed comparable and satisfactory results in detecting inconsistent Ansible tasks related to the top-10 used Ansible modules.