Cargando…

Hybrid feature selection based on SLI and genetic algorithm for microarray datasets

One of the major problems in microarray datasets is the large number of features, which causes the issue of “the curse of dimensionality” when machine learning is applied to these datasets. Feature selection refers to the process of finding optimal feature set by removing irrelevant and redundant fe...

Descripción completa

Detalles Bibliográficos
Autores principales: Abasabadi, Sedighe, Nematzadeh, Hossein, Motameni, Homayun, Akbari, Ebrahim
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9244444/
https://www.ncbi.nlm.nih.gov/pubmed/35789817
http://dx.doi.org/10.1007/s11227-022-04650-w
Descripción
Sumario:One of the major problems in microarray datasets is the large number of features, which causes the issue of “the curse of dimensionality” when machine learning is applied to these datasets. Feature selection refers to the process of finding optimal feature set by removing irrelevant and redundant features. It has a significant role in pattern recognition, classification, and machine learning. In this study, a new and efficient hybrid feature selection method, called Ga(rank&rand), is presented. The method combines a wrapper feature selection algorithm based on the genetic algorithm (GA) with a proposed filter feature selection method, SLI-γ. In Ga(rank&rand), some initial solutions are built regarding the most relevant features based on SLI-γ, and the remaining ones are only the random features. Eleven high-dimensional and standard datasets were used for the accuracy evaluation of the proposed SLI-γ. Additionally, four high-dimensional well-known datasets of microarray experiments were used to carry out an extensive experimental study for the performance evaluation of Ga(rank&rand). This experimental analysis showed the robustness of the method as well as its ability to obtain highly accurate solutions at the earlier stages of the GA evolutionary process. Finally, the performance of Ga(rank&rand) was also compared to the results of GA to highlight its competitiveness and its ability to successfully reduce the original feature set size and execution time.