Cargando…
A study of dealing class imbalance problem with machine learning methods for code smell severity detection using PCA-based feature selection technique
Detecting code smells may be highly helpful for reducing maintenance costs and raising source code quality. Code smells facilitate developers or researchers to understand several types of design flaws. Code smells with high severity can cause significant problems for the software and may cause chall...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10533884/ https://www.ncbi.nlm.nih.gov/pubmed/37758824 http://dx.doi.org/10.1038/s41598-023-43380-8 |
_version_ | 1785112271392866304 |
---|---|
author | Rao, Rajwant Singh Dewangan, Seema Mishra, Alok Gupta, Manjari |
author_facet | Rao, Rajwant Singh Dewangan, Seema Mishra, Alok Gupta, Manjari |
author_sort | Rao, Rajwant Singh |
collection | PubMed |
description | Detecting code smells may be highly helpful for reducing maintenance costs and raising source code quality. Code smells facilitate developers or researchers to understand several types of design flaws. Code smells with high severity can cause significant problems for the software and may cause challenges for the system's maintainability. It is quite essential to assess the severity of the code smells detected in software, as it prioritizes refactoring efforts. The class imbalance problem also further enhances the difficulties in code smell severity detection. In this study, four code smell severity datasets (Data class, God class, Feature envy, and Long method) are selected to detect code smell severity. In this work, an effort is made to address the issue of class imbalance, for which, the Synthetic Minority Oversampling Technique (SMOTE) class balancing technique is applied. Each dataset's relevant features are chosen using a feature selection technique based on principal component analysis. The severity of code smells is determined using five machine learning techniques: K-nearest neighbor, Random forest, Decision tree, Multi-layer Perceptron, and Logistic Regression. This study obtained the 0.99 severity accuracy score with the Random forest and Decision tree approach with the Long method code smell. The model's performance is compared based on its accuracy and three other performance measurements (Precision, Recall, and F-measure) to estimate severity classification models. The impact of performance is also compared and presented with and without applying SMOTE. The results obtained in the study are promising and can be beneficial for paving the way for further studies in this area. |
format | Online Article Text |
id | pubmed-10533884 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-105338842023-09-29 A study of dealing class imbalance problem with machine learning methods for code smell severity detection using PCA-based feature selection technique Rao, Rajwant Singh Dewangan, Seema Mishra, Alok Gupta, Manjari Sci Rep Article Detecting code smells may be highly helpful for reducing maintenance costs and raising source code quality. Code smells facilitate developers or researchers to understand several types of design flaws. Code smells with high severity can cause significant problems for the software and may cause challenges for the system's maintainability. It is quite essential to assess the severity of the code smells detected in software, as it prioritizes refactoring efforts. The class imbalance problem also further enhances the difficulties in code smell severity detection. In this study, four code smell severity datasets (Data class, God class, Feature envy, and Long method) are selected to detect code smell severity. In this work, an effort is made to address the issue of class imbalance, for which, the Synthetic Minority Oversampling Technique (SMOTE) class balancing technique is applied. Each dataset's relevant features are chosen using a feature selection technique based on principal component analysis. The severity of code smells is determined using five machine learning techniques: K-nearest neighbor, Random forest, Decision tree, Multi-layer Perceptron, and Logistic Regression. This study obtained the 0.99 severity accuracy score with the Random forest and Decision tree approach with the Long method code smell. The model's performance is compared based on its accuracy and three other performance measurements (Precision, Recall, and F-measure) to estimate severity classification models. The impact of performance is also compared and presented with and without applying SMOTE. The results obtained in the study are promising and can be beneficial for paving the way for further studies in this area. Nature Publishing Group UK 2023-09-27 /pmc/articles/PMC10533884/ /pubmed/37758824 http://dx.doi.org/10.1038/s41598-023-43380-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Rao, Rajwant Singh Dewangan, Seema Mishra, Alok Gupta, Manjari A study of dealing class imbalance problem with machine learning methods for code smell severity detection using PCA-based feature selection technique |
title | A study of dealing class imbalance problem with machine learning methods for code smell severity detection using PCA-based feature selection technique |
title_full | A study of dealing class imbalance problem with machine learning methods for code smell severity detection using PCA-based feature selection technique |
title_fullStr | A study of dealing class imbalance problem with machine learning methods for code smell severity detection using PCA-based feature selection technique |
title_full_unstemmed | A study of dealing class imbalance problem with machine learning methods for code smell severity detection using PCA-based feature selection technique |
title_short | A study of dealing class imbalance problem with machine learning methods for code smell severity detection using PCA-based feature selection technique |
title_sort | study of dealing class imbalance problem with machine learning methods for code smell severity detection using pca-based feature selection technique |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10533884/ https://www.ncbi.nlm.nih.gov/pubmed/37758824 http://dx.doi.org/10.1038/s41598-023-43380-8 |
work_keys_str_mv | AT raorajwantsingh astudyofdealingclassimbalanceproblemwithmachinelearningmethodsforcodesmellseveritydetectionusingpcabasedfeatureselectiontechnique AT dewanganseema astudyofdealingclassimbalanceproblemwithmachinelearningmethodsforcodesmellseveritydetectionusingpcabasedfeatureselectiontechnique AT mishraalok astudyofdealingclassimbalanceproblemwithmachinelearningmethodsforcodesmellseveritydetectionusingpcabasedfeatureselectiontechnique AT guptamanjari astudyofdealingclassimbalanceproblemwithmachinelearningmethodsforcodesmellseveritydetectionusingpcabasedfeatureselectiontechnique AT raorajwantsingh studyofdealingclassimbalanceproblemwithmachinelearningmethodsforcodesmellseveritydetectionusingpcabasedfeatureselectiontechnique AT dewanganseema studyofdealingclassimbalanceproblemwithmachinelearningmethodsforcodesmellseveritydetectionusingpcabasedfeatureselectiontechnique AT mishraalok studyofdealingclassimbalanceproblemwithmachinelearningmethodsforcodesmellseveritydetectionusingpcabasedfeatureselectiontechnique AT guptamanjari studyofdealingclassimbalanceproblemwithmachinelearningmethodsforcodesmellseveritydetectionusingpcabasedfeatureselectiontechnique |