Cargando…

An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction

Ranking-oriented cross-project defect prediction (ROCPDP), which ranks software modules of a new target industrial project based on the predicted defect number or density, has been suggested in the literature. A major concern of ROCPDP is the distribution difference between the source project (aka....

Descripción completa

Detalles Bibliográficos
Autores principales:	Luo, Haoyu, Dai, Heng, Peng, Weiqiang, Hu, Wenhua, Li, Fuyang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8625928/ https://www.ncbi.nlm.nih.gov/pubmed/34833608 http://dx.doi.org/10.3390/s21227535

_version_	1784606541609959424
author	Luo, Haoyu Dai, Heng Peng, Weiqiang Hu, Wenhua Li, Fuyang
author_facet	Luo, Haoyu Dai, Heng Peng, Weiqiang Hu, Wenhua Li, Fuyang
author_sort	Luo, Haoyu
collection	PubMed
description	Ranking-oriented cross-project defect prediction (ROCPDP), which ranks software modules of a new target industrial project based on the predicted defect number or density, has been suggested in the literature. A major concern of ROCPDP is the distribution difference between the source project (aka. within-project) data and target project (aka. cross-project) data, which evidently degrades prediction performance. To investigate the impacts of training data selection methods on the performances of ROCPDP models, we examined the practical effects of nine training data selection methods, including a global filter, which does not filter out any cross-project data. Additionally, the prediction performances of ROCPDP models trained on the filtered cross-project data using the training data selection methods were compared with those of ranking-oriented within-project defect prediction (ROWPDP) models trained on sufficient and limited within-project data. Eleven available defect datasets from the industrial projects were considered and evaluated using two ranking performance measures, i.e., FPA and Norm(Popt). The results showed no statistically significant differences among these nine training data selection methods in terms of FPA and Norm(Popt). The performances of ROCPDP models trained on filtered cross-project data were not comparable with those of ROWPDP models trained on sufficient historical within-project data. However, ROCPDP models trained on filtered cross-project data achieved better performance values than ROWPDP models trained on limited historical within-project data. Therefore, we recommended that software quality teams exploit other project datasets to perform ROCPDP when there is no or limited within-project data.
format	Online Article Text
id	pubmed-8625928
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-86259282021-11-27 An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction Luo, Haoyu Dai, Heng Peng, Weiqiang Hu, Wenhua Li, Fuyang Sensors (Basel) Article Ranking-oriented cross-project defect prediction (ROCPDP), which ranks software modules of a new target industrial project based on the predicted defect number or density, has been suggested in the literature. A major concern of ROCPDP is the distribution difference between the source project (aka. within-project) data and target project (aka. cross-project) data, which evidently degrades prediction performance. To investigate the impacts of training data selection methods on the performances of ROCPDP models, we examined the practical effects of nine training data selection methods, including a global filter, which does not filter out any cross-project data. Additionally, the prediction performances of ROCPDP models trained on the filtered cross-project data using the training data selection methods were compared with those of ranking-oriented within-project defect prediction (ROWPDP) models trained on sufficient and limited within-project data. Eleven available defect datasets from the industrial projects were considered and evaluated using two ranking performance measures, i.e., FPA and Norm(Popt). The results showed no statistically significant differences among these nine training data selection methods in terms of FPA and Norm(Popt). The performances of ROCPDP models trained on filtered cross-project data were not comparable with those of ROWPDP models trained on sufficient historical within-project data. However, ROCPDP models trained on filtered cross-project data achieved better performance values than ROWPDP models trained on limited historical within-project data. Therefore, we recommended that software quality teams exploit other project datasets to perform ROCPDP when there is no or limited within-project data. MDPI 2021-11-12 /pmc/articles/PMC8625928/ /pubmed/34833608 http://dx.doi.org/10.3390/s21227535 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Luo, Haoyu Dai, Heng Peng, Weiqiang Hu, Wenhua Li, Fuyang An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction
title	An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction
title_full	An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction
title_fullStr	An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction
title_full_unstemmed	An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction
title_short	An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction
title_sort	empirical study of training data selection methods for ranking-oriented cross-project defect prediction
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8625928/ https://www.ncbi.nlm.nih.gov/pubmed/34833608 http://dx.doi.org/10.3390/s21227535
work_keys_str_mv	AT luohaoyu anempiricalstudyoftrainingdataselectionmethodsforrankingorientedcrossprojectdefectprediction AT daiheng anempiricalstudyoftrainingdataselectionmethodsforrankingorientedcrossprojectdefectprediction AT pengweiqiang anempiricalstudyoftrainingdataselectionmethodsforrankingorientedcrossprojectdefectprediction AT huwenhua anempiricalstudyoftrainingdataselectionmethodsforrankingorientedcrossprojectdefectprediction AT lifuyang anempiricalstudyoftrainingdataselectionmethodsforrankingorientedcrossprojectdefectprediction AT luohaoyu empiricalstudyoftrainingdataselectionmethodsforrankingorientedcrossprojectdefectprediction AT daiheng empiricalstudyoftrainingdataselectionmethodsforrankingorientedcrossprojectdefectprediction AT pengweiqiang empiricalstudyoftrainingdataselectionmethodsforrankingorientedcrossprojectdefectprediction AT huwenhua empiricalstudyoftrainingdataselectionmethodsforrankingorientedcrossprojectdefectprediction AT lifuyang empiricalstudyoftrainingdataselectionmethodsforrankingorientedcrossprojectdefectprediction

An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction

Ejemplares similares