Cargando…

GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets

BACKGROUND: Feature selection is a crucial step in machine learning analysis. Currently, many feature selection approaches do not ensure satisfying results, in terms of accuracy and computational time, when the amount of data is huge, such as in ‘Omics’ datasets. RESULTS: Here, we propose an innovat...

Descripción completa

Detalles Bibliográficos
Autores principales: Chiesa, Mattia, Maioli, Giada, Colombo, Gualtiero I., Piacentini, Luca
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7014945/
https://www.ncbi.nlm.nih.gov/pubmed/32046651
http://dx.doi.org/10.1186/s12859-020-3400-6
Descripción
Sumario:BACKGROUND: Feature selection is a crucial step in machine learning analysis. Currently, many feature selection approaches do not ensure satisfying results, in terms of accuracy and computational time, when the amount of data is huge, such as in ‘Omics’ datasets. RESULTS: Here, we propose an innovative implementation of a genetic algorithm, called GARS, for fast and accurate identification of informative features in multi-class and high-dimensional datasets. In all simulations, GARS outperformed two standard filter-based and two ‘wrapper’ and one embedded’ selection methods, showing high classification accuracies in a reasonable computational time. CONCLUSIONS: GARS proved to be a suitable tool for performing feature selection on high-dimensional data. Therefore, GARS could be adopted when standard feature selection approaches do not provide satisfactory results or when there is a huge amount of data to be analyzed.