Cargando…

A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature Selection Method in Software Defect Prediction

The high dimensionality of software metric features has long been noted as a data quality problem that affects the performance of software defect prediction (SDP) models. This drawback makes it necessary to apply feature selection (FS) algorithm(s) in SDP processes. FS approaches can be categorized...

Descripción completa

Detalles Bibliográficos
Autores principales: Balogun, Abdullateef O., Basri, Shuib, Mahamad, Saipunidzam, Capretz, Luiz Fernando, Imam, Abdullahi Abubakar, Almomani, Malek A., Adeyemo, Victor E., Kumar, Ganesh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8635927/
https://www.ncbi.nlm.nih.gov/pubmed/34868291
http://dx.doi.org/10.1155/2021/5069016
_version_ 1784608425524592640
author Balogun, Abdullateef O.
Basri, Shuib
Mahamad, Saipunidzam
Capretz, Luiz Fernando
Imam, Abdullahi Abubakar
Almomani, Malek A.
Adeyemo, Victor E.
Kumar, Ganesh
author_facet Balogun, Abdullateef O.
Basri, Shuib
Mahamad, Saipunidzam
Capretz, Luiz Fernando
Imam, Abdullahi Abubakar
Almomani, Malek A.
Adeyemo, Victor E.
Kumar, Ganesh
author_sort Balogun, Abdullateef O.
collection PubMed
description The high dimensionality of software metric features has long been noted as a data quality problem that affects the performance of software defect prediction (SDP) models. This drawback makes it necessary to apply feature selection (FS) algorithm(s) in SDP processes. FS approaches can be categorized into three types, namely, filter FS (FFS), wrapper FS (WFS), and hybrid FS (HFS). HFS has been established as superior because it combines the strength of both FFS and WFS methods. However, selecting the most appropriate FFS (filter rank selection problem) for HFS is a challenge because the performance of FFS methods depends on the choice of datasets and classifiers. In addition, the local optima stagnation and high computational costs of WFS due to large search spaces are inherited by the HFS method. Therefore, as a solution, this study proposes a novel rank aggregation-based hybrid multifilter wrapper feature selection (RAHMFWFS) method for the selection of relevant and irredundant features from software defect datasets. The proposed RAHMFWFS is divided into two stepwise stages. The first stage involves a rank aggregation-based multifilter feature selection (RMFFS) method that addresses the filter rank selection problem by aggregating individual rank lists from multiple filter methods, using a novel rank aggregation method to generate a single, robust, and non-disjoint rank list. In the second stage, the aggregated ranked features are further preprocessed by an enhanced wrapper feature selection (EWFS) method based on a dynamic reranking strategy that is used to guide the feature subset selection process of the HFS method. This, in turn, reduces the number of evaluation cycles while amplifying or maintaining its prediction performance. The feasibility of the proposed RAHMFWFS was demonstrated on benchmarked software defect datasets with Naïve Bayes and Decision Tree classifiers, based on accuracy, the area under the curve (AUC), and F-measure values. The experimental results showed the effectiveness of RAHMFWFS in addressing filter rank selection and local optima stagnation problems in HFS, as well as the ability to select optimal features from SDP datasets while maintaining or enhancing the performance of SDP models. To conclude, the proposed RAHMFWFS achieved good performance by improving the prediction performances of SDP models across the selected datasets, compared to existing state-of-the-arts HFS methods.
format Online
Article
Text
id pubmed-8635927
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-86359272021-12-02 A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature Selection Method in Software Defect Prediction Balogun, Abdullateef O. Basri, Shuib Mahamad, Saipunidzam Capretz, Luiz Fernando Imam, Abdullahi Abubakar Almomani, Malek A. Adeyemo, Victor E. Kumar, Ganesh Comput Intell Neurosci Research Article The high dimensionality of software metric features has long been noted as a data quality problem that affects the performance of software defect prediction (SDP) models. This drawback makes it necessary to apply feature selection (FS) algorithm(s) in SDP processes. FS approaches can be categorized into three types, namely, filter FS (FFS), wrapper FS (WFS), and hybrid FS (HFS). HFS has been established as superior because it combines the strength of both FFS and WFS methods. However, selecting the most appropriate FFS (filter rank selection problem) for HFS is a challenge because the performance of FFS methods depends on the choice of datasets and classifiers. In addition, the local optima stagnation and high computational costs of WFS due to large search spaces are inherited by the HFS method. Therefore, as a solution, this study proposes a novel rank aggregation-based hybrid multifilter wrapper feature selection (RAHMFWFS) method for the selection of relevant and irredundant features from software defect datasets. The proposed RAHMFWFS is divided into two stepwise stages. The first stage involves a rank aggregation-based multifilter feature selection (RMFFS) method that addresses the filter rank selection problem by aggregating individual rank lists from multiple filter methods, using a novel rank aggregation method to generate a single, robust, and non-disjoint rank list. In the second stage, the aggregated ranked features are further preprocessed by an enhanced wrapper feature selection (EWFS) method based on a dynamic reranking strategy that is used to guide the feature subset selection process of the HFS method. This, in turn, reduces the number of evaluation cycles while amplifying or maintaining its prediction performance. The feasibility of the proposed RAHMFWFS was demonstrated on benchmarked software defect datasets with Naïve Bayes and Decision Tree classifiers, based on accuracy, the area under the curve (AUC), and F-measure values. The experimental results showed the effectiveness of RAHMFWFS in addressing filter rank selection and local optima stagnation problems in HFS, as well as the ability to select optimal features from SDP datasets while maintaining or enhancing the performance of SDP models. To conclude, the proposed RAHMFWFS achieved good performance by improving the prediction performances of SDP models across the selected datasets, compared to existing state-of-the-arts HFS methods. Hindawi 2021-11-24 /pmc/articles/PMC8635927/ /pubmed/34868291 http://dx.doi.org/10.1155/2021/5069016 Text en Copyright © 2021 Abdullateef O. Balogun et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Balogun, Abdullateef O.
Basri, Shuib
Mahamad, Saipunidzam
Capretz, Luiz Fernando
Imam, Abdullahi Abubakar
Almomani, Malek A.
Adeyemo, Victor E.
Kumar, Ganesh
A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature Selection Method in Software Defect Prediction
title A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature Selection Method in Software Defect Prediction
title_full A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature Selection Method in Software Defect Prediction
title_fullStr A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature Selection Method in Software Defect Prediction
title_full_unstemmed A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature Selection Method in Software Defect Prediction
title_short A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature Selection Method in Software Defect Prediction
title_sort novel rank aggregation-based hybrid multifilter wrapper feature selection method in software defect prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8635927/
https://www.ncbi.nlm.nih.gov/pubmed/34868291
http://dx.doi.org/10.1155/2021/5069016
work_keys_str_mv AT balogunabdullateefo anovelrankaggregationbasedhybridmultifilterwrapperfeatureselectionmethodinsoftwaredefectprediction
AT basrishuib anovelrankaggregationbasedhybridmultifilterwrapperfeatureselectionmethodinsoftwaredefectprediction
AT mahamadsaipunidzam anovelrankaggregationbasedhybridmultifilterwrapperfeatureselectionmethodinsoftwaredefectprediction
AT capretzluizfernando anovelrankaggregationbasedhybridmultifilterwrapperfeatureselectionmethodinsoftwaredefectprediction
AT imamabdullahiabubakar anovelrankaggregationbasedhybridmultifilterwrapperfeatureselectionmethodinsoftwaredefectprediction
AT almomanimaleka anovelrankaggregationbasedhybridmultifilterwrapperfeatureselectionmethodinsoftwaredefectprediction
AT adeyemovictore anovelrankaggregationbasedhybridmultifilterwrapperfeatureselectionmethodinsoftwaredefectprediction
AT kumarganesh anovelrankaggregationbasedhybridmultifilterwrapperfeatureselectionmethodinsoftwaredefectprediction
AT balogunabdullateefo novelrankaggregationbasedhybridmultifilterwrapperfeatureselectionmethodinsoftwaredefectprediction
AT basrishuib novelrankaggregationbasedhybridmultifilterwrapperfeatureselectionmethodinsoftwaredefectprediction
AT mahamadsaipunidzam novelrankaggregationbasedhybridmultifilterwrapperfeatureselectionmethodinsoftwaredefectprediction
AT capretzluizfernando novelrankaggregationbasedhybridmultifilterwrapperfeatureselectionmethodinsoftwaredefectprediction
AT imamabdullahiabubakar novelrankaggregationbasedhybridmultifilterwrapperfeatureselectionmethodinsoftwaredefectprediction
AT almomanimaleka novelrankaggregationbasedhybridmultifilterwrapperfeatureselectionmethodinsoftwaredefectprediction
AT adeyemovictore novelrankaggregationbasedhybridmultifilterwrapperfeatureselectionmethodinsoftwaredefectprediction
AT kumarganesh novelrankaggregationbasedhybridmultifilterwrapperfeatureselectionmethodinsoftwaredefectprediction