Cargando…

Graph-based machine learning improves just-in-time defect prediction

The increasing complexity of today’s software requires the contribution of thousands of developers. This complex collaboration structure makes developers more likely to introduce defect-prone changes that lead to software faults. Determining when these defect-prone changes are introduced has proven...

Descripción completa

Detalles Bibliográficos
Autores principales: Bryan, Jonathan, Moriano, Pablo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10101485/
https://www.ncbi.nlm.nih.gov/pubmed/37053155
http://dx.doi.org/10.1371/journal.pone.0284077
_version_ 1785025528411979776
author Bryan, Jonathan
Moriano, Pablo
author_facet Bryan, Jonathan
Moriano, Pablo
author_sort Bryan, Jonathan
collection PubMed
description The increasing complexity of today’s software requires the contribution of thousands of developers. This complex collaboration structure makes developers more likely to introduce defect-prone changes that lead to software faults. Determining when these defect-prone changes are introduced has proven challenging, and using traditional machine learning (ML) methods to make these determinations seems to have reached a plateau. In this work, we build contribution graphs consisting of developers and source files to capture the nuanced complexity of changes required to build software. By leveraging these contribution graphs, our research shows the potential of using graph-based ML to improve Just-In-Time (JIT) defect prediction. We hypothesize that features extracted from the contribution graphs may be better predictors of defect-prone changes than intrinsic features derived from software characteristics. We corroborate our hypothesis using graph-based ML for classifying edges that represent defect-prone changes. This new framing of the JIT defect prediction problem leads to remarkably better results. We test our approach on 14 open-source projects and show that our best model can predict whether or not a code change will lead to a defect with an F1 score as high as 77.55% and a Matthews correlation coefficient (MCC) as high as 53.16%. This represents a 152% higher F1 score and a 3% higher MCC over the state-of-the-art JIT defect prediction. We describe limitations, open challenges, and how this method can be used for operational JIT defect prediction.
format Online
Article
Text
id pubmed-10101485
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-101014852023-04-14 Graph-based machine learning improves just-in-time defect prediction Bryan, Jonathan Moriano, Pablo PLoS One Research Article The increasing complexity of today’s software requires the contribution of thousands of developers. This complex collaboration structure makes developers more likely to introduce defect-prone changes that lead to software faults. Determining when these defect-prone changes are introduced has proven challenging, and using traditional machine learning (ML) methods to make these determinations seems to have reached a plateau. In this work, we build contribution graphs consisting of developers and source files to capture the nuanced complexity of changes required to build software. By leveraging these contribution graphs, our research shows the potential of using graph-based ML to improve Just-In-Time (JIT) defect prediction. We hypothesize that features extracted from the contribution graphs may be better predictors of defect-prone changes than intrinsic features derived from software characteristics. We corroborate our hypothesis using graph-based ML for classifying edges that represent defect-prone changes. This new framing of the JIT defect prediction problem leads to remarkably better results. We test our approach on 14 open-source projects and show that our best model can predict whether or not a code change will lead to a defect with an F1 score as high as 77.55% and a Matthews correlation coefficient (MCC) as high as 53.16%. This represents a 152% higher F1 score and a 3% higher MCC over the state-of-the-art JIT defect prediction. We describe limitations, open challenges, and how this method can be used for operational JIT defect prediction. Public Library of Science 2023-04-13 /pmc/articles/PMC10101485/ /pubmed/37053155 http://dx.doi.org/10.1371/journal.pone.0284077 Text en © 2023 Bryan, Moriano https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Bryan, Jonathan
Moriano, Pablo
Graph-based machine learning improves just-in-time defect prediction
title Graph-based machine learning improves just-in-time defect prediction
title_full Graph-based machine learning improves just-in-time defect prediction
title_fullStr Graph-based machine learning improves just-in-time defect prediction
title_full_unstemmed Graph-based machine learning improves just-in-time defect prediction
title_short Graph-based machine learning improves just-in-time defect prediction
title_sort graph-based machine learning improves just-in-time defect prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10101485/
https://www.ncbi.nlm.nih.gov/pubmed/37053155
http://dx.doi.org/10.1371/journal.pone.0284077
work_keys_str_mv AT bryanjonathan graphbasedmachinelearningimprovesjustintimedefectprediction
AT morianopablo graphbasedmachinelearningimprovesjustintimedefectprediction