Cargando…

GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets

BACKGROUND: Feature selection is a crucial step in machine learning analysis. Currently, many feature selection approaches do not ensure satisfying results, in terms of accuracy and computational time, when the amount of data is huge, such as in ‘Omics’ datasets. RESULTS: Here, we propose an innovat...

Descripción completa

Detalles Bibliográficos
Autores principales: Chiesa, Mattia, Maioli, Giada, Colombo, Gualtiero I., Piacentini, Luca
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7014945/
https://www.ncbi.nlm.nih.gov/pubmed/32046651
http://dx.doi.org/10.1186/s12859-020-3400-6
_version_ 1783496738128003072
author Chiesa, Mattia
Maioli, Giada
Colombo, Gualtiero I.
Piacentini, Luca
author_facet Chiesa, Mattia
Maioli, Giada
Colombo, Gualtiero I.
Piacentini, Luca
author_sort Chiesa, Mattia
collection PubMed
description BACKGROUND: Feature selection is a crucial step in machine learning analysis. Currently, many feature selection approaches do not ensure satisfying results, in terms of accuracy and computational time, when the amount of data is huge, such as in ‘Omics’ datasets. RESULTS: Here, we propose an innovative implementation of a genetic algorithm, called GARS, for fast and accurate identification of informative features in multi-class and high-dimensional datasets. In all simulations, GARS outperformed two standard filter-based and two ‘wrapper’ and one embedded’ selection methods, showing high classification accuracies in a reasonable computational time. CONCLUSIONS: GARS proved to be a suitable tool for performing feature selection on high-dimensional data. Therefore, GARS could be adopted when standard feature selection approaches do not provide satisfactory results or when there is a huge amount of data to be analyzed.
format Online
Article
Text
id pubmed-7014945
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-70149452020-02-20 GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets Chiesa, Mattia Maioli, Giada Colombo, Gualtiero I. Piacentini, Luca BMC Bioinformatics Software BACKGROUND: Feature selection is a crucial step in machine learning analysis. Currently, many feature selection approaches do not ensure satisfying results, in terms of accuracy and computational time, when the amount of data is huge, such as in ‘Omics’ datasets. RESULTS: Here, we propose an innovative implementation of a genetic algorithm, called GARS, for fast and accurate identification of informative features in multi-class and high-dimensional datasets. In all simulations, GARS outperformed two standard filter-based and two ‘wrapper’ and one embedded’ selection methods, showing high classification accuracies in a reasonable computational time. CONCLUSIONS: GARS proved to be a suitable tool for performing feature selection on high-dimensional data. Therefore, GARS could be adopted when standard feature selection approaches do not provide satisfactory results or when there is a huge amount of data to be analyzed. BioMed Central 2020-02-11 /pmc/articles/PMC7014945/ /pubmed/32046651 http://dx.doi.org/10.1186/s12859-020-3400-6 Text en © The Author(s). 2020 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Chiesa, Mattia
Maioli, Giada
Colombo, Gualtiero I.
Piacentini, Luca
GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets
title GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets
title_full GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets
title_fullStr GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets
title_full_unstemmed GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets
title_short GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets
title_sort gars: genetic algorithm for the identification of a robust subset of features in high-dimensional datasets
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7014945/
https://www.ncbi.nlm.nih.gov/pubmed/32046651
http://dx.doi.org/10.1186/s12859-020-3400-6
work_keys_str_mv AT chiesamattia garsgeneticalgorithmfortheidentificationofarobustsubsetoffeaturesinhighdimensionaldatasets
AT maioligiada garsgeneticalgorithmfortheidentificationofarobustsubsetoffeaturesinhighdimensionaldatasets
AT colombogualtieroi garsgeneticalgorithmfortheidentificationofarobustsubsetoffeaturesinhighdimensionaldatasets
AT piacentiniluca garsgeneticalgorithmfortheidentificationofarobustsubsetoffeaturesinhighdimensionaldatasets