Cargando…

The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy

One of the significant purposes of building a model is to increase its accuracy within a shorter timeframe through the feature selection process. It is carried out by determining the importance of available features in a dataset using Information Gain (IG). The process is used to calculate the amoun...

Descripción completa

Detalles Bibliográficos
Autores principales: Prasetiyowati, Maria Irmina, Maulidevi, Nur Ulfa, Surendro, Kridanto
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9299283/
https://www.ncbi.nlm.nih.gov/pubmed/35875646
http://dx.doi.org/10.7717/peerj-cs.1041
_version_ 1784750931388137472
author Prasetiyowati, Maria Irmina
Maulidevi, Nur Ulfa
Surendro, Kridanto
author_facet Prasetiyowati, Maria Irmina
Maulidevi, Nur Ulfa
Surendro, Kridanto
author_sort Prasetiyowati, Maria Irmina
collection PubMed
description One of the significant purposes of building a model is to increase its accuracy within a shorter timeframe through the feature selection process. It is carried out by determining the importance of available features in a dataset using Information Gain (IG). The process is used to calculate the amounts of information contained in features with high values selected to accelerate the performance of an algorithm. In selecting informative features, a threshold value (cut-off) is used by the Information Gain (IG). Therefore, this research aims to determine the time and accuracy-performance needed to improve feature selection by integrating IG, the Fast Fourier Transform (FFT), and Synthetic Minor Oversampling Technique (SMOTE) methods. The feature selection model is then applied to the Random Forest, a tree-based machine learning algorithm with random feature selection. A total of eight datasets consisting of three balanced and five imbalanced datasets were used to conduct this research. Furthermore, the SMOTE found in the imbalance dataset was used to balance the data. The result showed that the feature selection using Information Gain, FFT, and SMOTE improved the performance accuracy of Random Forest.
format Online
Article
Text
id pubmed-9299283
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-92992832022-07-21 The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy Prasetiyowati, Maria Irmina Maulidevi, Nur Ulfa Surendro, Kridanto PeerJ Comput Sci Data Mining and Machine Learning One of the significant purposes of building a model is to increase its accuracy within a shorter timeframe through the feature selection process. It is carried out by determining the importance of available features in a dataset using Information Gain (IG). The process is used to calculate the amounts of information contained in features with high values selected to accelerate the performance of an algorithm. In selecting informative features, a threshold value (cut-off) is used by the Information Gain (IG). Therefore, this research aims to determine the time and accuracy-performance needed to improve feature selection by integrating IG, the Fast Fourier Transform (FFT), and Synthetic Minor Oversampling Technique (SMOTE) methods. The feature selection model is then applied to the Random Forest, a tree-based machine learning algorithm with random feature selection. A total of eight datasets consisting of three balanced and five imbalanced datasets were used to conduct this research. Furthermore, the SMOTE found in the imbalance dataset was used to balance the data. The result showed that the feature selection using Information Gain, FFT, and SMOTE improved the performance accuracy of Random Forest. PeerJ Inc. 2022-07-14 /pmc/articles/PMC9299283/ /pubmed/35875646 http://dx.doi.org/10.7717/peerj-cs.1041 Text en ©2022 Prasetiyowati et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Data Mining and Machine Learning
Prasetiyowati, Maria Irmina
Maulidevi, Nur Ulfa
Surendro, Kridanto
The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy
title The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy
title_full The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy
title_fullStr The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy
title_full_unstemmed The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy
title_short The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy
title_sort accuracy of random forest performance can be improved by conducting a feature selection with a balancing strategy
topic Data Mining and Machine Learning
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9299283/
https://www.ncbi.nlm.nih.gov/pubmed/35875646
http://dx.doi.org/10.7717/peerj-cs.1041
work_keys_str_mv AT prasetiyowatimariairmina theaccuracyofrandomforestperformancecanbeimprovedbyconductingafeatureselectionwithabalancingstrategy
AT maulidevinurulfa theaccuracyofrandomforestperformancecanbeimprovedbyconductingafeatureselectionwithabalancingstrategy
AT surendrokridanto theaccuracyofrandomforestperformancecanbeimprovedbyconductingafeatureselectionwithabalancingstrategy
AT prasetiyowatimariairmina accuracyofrandomforestperformancecanbeimprovedbyconductingafeatureselectionwithabalancingstrategy
AT maulidevinurulfa accuracyofrandomforestperformancecanbeimprovedbyconductingafeatureselectionwithabalancingstrategy
AT surendrokridanto accuracyofrandomforestperformancecanbeimprovedbyconductingafeatureselectionwithabalancingstrategy