Cargando…

Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning

Software Fault Prediction (SFP) is an important process to detect the faulty components of the software to detect faulty classes or faulty modules early in the software development life cycle. In this paper, a machine learning framework is proposed for SFP. Initially, pre-processing and re-sampling...

Descripción completa

Detalles Bibliográficos
Autores principales: Mafarja, Majdi, Thaher, Thaer, Al-Betar, Mohammed Azmi, Too, Jingwei, Awadallah, Mohammed A., Abu Doush, Iyad, Turabieh, Hamza
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9909674/
https://www.ncbi.nlm.nih.gov/pubmed/36785593
http://dx.doi.org/10.1007/s10489-022-04427-x
_version_ 1784884624441212928
author Mafarja, Majdi
Thaher, Thaer
Al-Betar, Mohammed Azmi
Too, Jingwei
Awadallah, Mohammed A.
Abu Doush, Iyad
Turabieh, Hamza
author_facet Mafarja, Majdi
Thaher, Thaer
Al-Betar, Mohammed Azmi
Too, Jingwei
Awadallah, Mohammed A.
Abu Doush, Iyad
Turabieh, Hamza
author_sort Mafarja, Majdi
collection PubMed
description Software Fault Prediction (SFP) is an important process to detect the faulty components of the software to detect faulty classes or faulty modules early in the software development life cycle. In this paper, a machine learning framework is proposed for SFP. Initially, pre-processing and re-sampling techniques are applied to make the SFP datasets ready to be used by ML techniques. Thereafter seven classifiers are compared, namely K-Nearest Neighbors (KNN), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Linear Regression (LR), Decision Tree (DT), Support Vector Machine (SVM), and Random Forest (RF). The RF classifier outperforms all other classifiers in terms of eliminating irrelevant/redundant features. The performance of RF is improved further using a dimensionality reduction method called binary whale optimization algorithm (BWOA) to eliminate the irrelevant/redundant features. Finally, the performance of BWOA is enhanced by hybridizing the exploration strategies of the grey wolf optimizer (GWO) and harris hawks optimization (HHO) algorithms. The proposed method is called SBEWOA. The SFP datasets utilized are selected from the PROMISE repository using sixteen datasets for software projects with different sizes and complexity. The comparative evaluation against nine well-established feature selection methods proves that the proposed SBEWOA is able to significantly produce competitively superior results for several instances of the evaluated dataset. The algorithms’ performance is compared in terms of accuracy, the number of features, and fitness function. This is also proved by the 2-tailed P-values of the Wilcoxon signed ranks statistical test used. In conclusion, the proposed method is an efficient alternative ML method for SFP that can be used for similar problems in the software engineering domain.
format Online
Article
Text
id pubmed-9909674
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-99096742023-02-09 Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning Mafarja, Majdi Thaher, Thaer Al-Betar, Mohammed Azmi Too, Jingwei Awadallah, Mohammed A. Abu Doush, Iyad Turabieh, Hamza Appl Intell (Dordr) Article Software Fault Prediction (SFP) is an important process to detect the faulty components of the software to detect faulty classes or faulty modules early in the software development life cycle. In this paper, a machine learning framework is proposed for SFP. Initially, pre-processing and re-sampling techniques are applied to make the SFP datasets ready to be used by ML techniques. Thereafter seven classifiers are compared, namely K-Nearest Neighbors (KNN), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Linear Regression (LR), Decision Tree (DT), Support Vector Machine (SVM), and Random Forest (RF). The RF classifier outperforms all other classifiers in terms of eliminating irrelevant/redundant features. The performance of RF is improved further using a dimensionality reduction method called binary whale optimization algorithm (BWOA) to eliminate the irrelevant/redundant features. Finally, the performance of BWOA is enhanced by hybridizing the exploration strategies of the grey wolf optimizer (GWO) and harris hawks optimization (HHO) algorithms. The proposed method is called SBEWOA. The SFP datasets utilized are selected from the PROMISE repository using sixteen datasets for software projects with different sizes and complexity. The comparative evaluation against nine well-established feature selection methods proves that the proposed SBEWOA is able to significantly produce competitively superior results for several instances of the evaluated dataset. The algorithms’ performance is compared in terms of accuracy, the number of features, and fitness function. This is also proved by the 2-tailed P-values of the Wilcoxon signed ranks statistical test used. In conclusion, the proposed method is an efficient alternative ML method for SFP that can be used for similar problems in the software engineering domain. Springer US 2023-02-09 /pmc/articles/PMC9909674/ /pubmed/36785593 http://dx.doi.org/10.1007/s10489-022-04427-x Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Mafarja, Majdi
Thaher, Thaer
Al-Betar, Mohammed Azmi
Too, Jingwei
Awadallah, Mohammed A.
Abu Doush, Iyad
Turabieh, Hamza
Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning
title Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning
title_full Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning
title_fullStr Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning
title_full_unstemmed Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning
title_short Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning
title_sort classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9909674/
https://www.ncbi.nlm.nih.gov/pubmed/36785593
http://dx.doi.org/10.1007/s10489-022-04427-x
work_keys_str_mv AT mafarjamajdi classificationframeworkforfaultysoftwareusingenhancedexploratorywhaleoptimizerbasedfeatureselectionschemeandrandomforestensemblelearning
AT thaherthaer classificationframeworkforfaultysoftwareusingenhancedexploratorywhaleoptimizerbasedfeatureselectionschemeandrandomforestensemblelearning
AT albetarmohammedazmi classificationframeworkforfaultysoftwareusingenhancedexploratorywhaleoptimizerbasedfeatureselectionschemeandrandomforestensemblelearning
AT toojingwei classificationframeworkforfaultysoftwareusingenhancedexploratorywhaleoptimizerbasedfeatureselectionschemeandrandomforestensemblelearning
AT awadallahmohammeda classificationframeworkforfaultysoftwareusingenhancedexploratorywhaleoptimizerbasedfeatureselectionschemeandrandomforestensemblelearning
AT abudoushiyad classificationframeworkforfaultysoftwareusingenhancedexploratorywhaleoptimizerbasedfeatureselectionschemeandrandomforestensemblelearning
AT turabiehhamza classificationframeworkforfaultysoftwareusingenhancedexploratorywhaleoptimizerbasedfeatureselectionschemeandrandomforestensemblelearning