Cargando…

Cross-Project Defect Prediction Based on Two-Phase Feature Importance Amplification

As the typical application of computational intelligence in software engineering, cross-project defect prediction (CPDP) uses labeled data from other projects (source projects) for building models to predict the defects in the current projects (target projects), helping testers quickly locate the de...

Descripción completa

Detalles Bibliográficos
Autores principales: Xing, Ying, Lin, Wanting, Lin, Xueyan, Yang, Bin, Tan, Zhou
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9038416/
https://www.ncbi.nlm.nih.gov/pubmed/35479605
http://dx.doi.org/10.1155/2022/2320447
_version_ 1784693916086304768
author Xing, Ying
Lin, Wanting
Lin, Xueyan
Yang, Bin
Tan, Zhou
author_facet Xing, Ying
Lin, Wanting
Lin, Xueyan
Yang, Bin
Tan, Zhou
author_sort Xing, Ying
collection PubMed
description As the typical application of computational intelligence in software engineering, cross-project defect prediction (CPDP) uses labeled data from other projects (source projects) for building models to predict the defects in the current projects (target projects), helping testers quickly locate the defective modules. But class imbalance and different data distribution among projects make CPDP a challenging topic. To address the above two problems, we propose a two-phase feature importance amplification (TFIA) CPDP model in this paper which can solve these two problems from domain adaptation phase and classification phase. In the domain adaptation phase, the differences in data distribution among projects are reduced by filtering both source and target projects, and the correlation-based feature selection with greedy best-first search amplifies the importance of features with strong feature-class correlation. In the classification phase, Random Forest works as the classifier to further amplify the importance of highly correlated features and establish a model which is sensitive to highly correlated features. We conducted both ablation experiments and comparison experiments on the widely used AEEEM database. Experimental results show that TFIA can yield significant improvement on CPDP. And the performance of TFIA CPDP model in all experiments is stable and efficient, which lays a solid foundation for its further application in practical engineering.
format Online
Article
Text
id pubmed-9038416
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-90384162022-04-26 Cross-Project Defect Prediction Based on Two-Phase Feature Importance Amplification Xing, Ying Lin, Wanting Lin, Xueyan Yang, Bin Tan, Zhou Comput Intell Neurosci Research Article As the typical application of computational intelligence in software engineering, cross-project defect prediction (CPDP) uses labeled data from other projects (source projects) for building models to predict the defects in the current projects (target projects), helping testers quickly locate the defective modules. But class imbalance and different data distribution among projects make CPDP a challenging topic. To address the above two problems, we propose a two-phase feature importance amplification (TFIA) CPDP model in this paper which can solve these two problems from domain adaptation phase and classification phase. In the domain adaptation phase, the differences in data distribution among projects are reduced by filtering both source and target projects, and the correlation-based feature selection with greedy best-first search amplifies the importance of features with strong feature-class correlation. In the classification phase, Random Forest works as the classifier to further amplify the importance of highly correlated features and establish a model which is sensitive to highly correlated features. We conducted both ablation experiments and comparison experiments on the widely used AEEEM database. Experimental results show that TFIA can yield significant improvement on CPDP. And the performance of TFIA CPDP model in all experiments is stable and efficient, which lays a solid foundation for its further application in practical engineering. Hindawi 2022-04-18 /pmc/articles/PMC9038416/ /pubmed/35479605 http://dx.doi.org/10.1155/2022/2320447 Text en Copyright © 2022 Ying Xing et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Xing, Ying
Lin, Wanting
Lin, Xueyan
Yang, Bin
Tan, Zhou
Cross-Project Defect Prediction Based on Two-Phase Feature Importance Amplification
title Cross-Project Defect Prediction Based on Two-Phase Feature Importance Amplification
title_full Cross-Project Defect Prediction Based on Two-Phase Feature Importance Amplification
title_fullStr Cross-Project Defect Prediction Based on Two-Phase Feature Importance Amplification
title_full_unstemmed Cross-Project Defect Prediction Based on Two-Phase Feature Importance Amplification
title_short Cross-Project Defect Prediction Based on Two-Phase Feature Importance Amplification
title_sort cross-project defect prediction based on two-phase feature importance amplification
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9038416/
https://www.ncbi.nlm.nih.gov/pubmed/35479605
http://dx.doi.org/10.1155/2022/2320447
work_keys_str_mv AT xingying crossprojectdefectpredictionbasedontwophasefeatureimportanceamplification
AT linwanting crossprojectdefectpredictionbasedontwophasefeatureimportanceamplification
AT linxueyan crossprojectdefectpredictionbasedontwophasefeatureimportanceamplification
AT yangbin crossprojectdefectpredictionbasedontwophasefeatureimportanceamplification
AT tanzhou crossprojectdefectpredictionbasedontwophasefeatureimportanceamplification