Cargando…
GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets
BACKGROUND: Feature selection is a crucial step in machine learning analysis. Currently, many feature selection approaches do not ensure satisfying results, in terms of accuracy and computational time, when the amount of data is huge, such as in ‘Omics’ datasets. RESULTS: Here, we propose an innovat...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7014945/ https://www.ncbi.nlm.nih.gov/pubmed/32046651 http://dx.doi.org/10.1186/s12859-020-3400-6 |
_version_ | 1783496738128003072 |
---|---|
author | Chiesa, Mattia Maioli, Giada Colombo, Gualtiero I. Piacentini, Luca |
author_facet | Chiesa, Mattia Maioli, Giada Colombo, Gualtiero I. Piacentini, Luca |
author_sort | Chiesa, Mattia |
collection | PubMed |
description | BACKGROUND: Feature selection is a crucial step in machine learning analysis. Currently, many feature selection approaches do not ensure satisfying results, in terms of accuracy and computational time, when the amount of data is huge, such as in ‘Omics’ datasets. RESULTS: Here, we propose an innovative implementation of a genetic algorithm, called GARS, for fast and accurate identification of informative features in multi-class and high-dimensional datasets. In all simulations, GARS outperformed two standard filter-based and two ‘wrapper’ and one embedded’ selection methods, showing high classification accuracies in a reasonable computational time. CONCLUSIONS: GARS proved to be a suitable tool for performing feature selection on high-dimensional data. Therefore, GARS could be adopted when standard feature selection approaches do not provide satisfactory results or when there is a huge amount of data to be analyzed. |
format | Online Article Text |
id | pubmed-7014945 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-70149452020-02-20 GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets Chiesa, Mattia Maioli, Giada Colombo, Gualtiero I. Piacentini, Luca BMC Bioinformatics Software BACKGROUND: Feature selection is a crucial step in machine learning analysis. Currently, many feature selection approaches do not ensure satisfying results, in terms of accuracy and computational time, when the amount of data is huge, such as in ‘Omics’ datasets. RESULTS: Here, we propose an innovative implementation of a genetic algorithm, called GARS, for fast and accurate identification of informative features in multi-class and high-dimensional datasets. In all simulations, GARS outperformed two standard filter-based and two ‘wrapper’ and one embedded’ selection methods, showing high classification accuracies in a reasonable computational time. CONCLUSIONS: GARS proved to be a suitable tool for performing feature selection on high-dimensional data. Therefore, GARS could be adopted when standard feature selection approaches do not provide satisfactory results or when there is a huge amount of data to be analyzed. BioMed Central 2020-02-11 /pmc/articles/PMC7014945/ /pubmed/32046651 http://dx.doi.org/10.1186/s12859-020-3400-6 Text en © The Author(s). 2020 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Chiesa, Mattia Maioli, Giada Colombo, Gualtiero I. Piacentini, Luca GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets |
title | GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets |
title_full | GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets |
title_fullStr | GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets |
title_full_unstemmed | GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets |
title_short | GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets |
title_sort | gars: genetic algorithm for the identification of a robust subset of features in high-dimensional datasets |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7014945/ https://www.ncbi.nlm.nih.gov/pubmed/32046651 http://dx.doi.org/10.1186/s12859-020-3400-6 |
work_keys_str_mv | AT chiesamattia garsgeneticalgorithmfortheidentificationofarobustsubsetoffeaturesinhighdimensionaldatasets AT maioligiada garsgeneticalgorithmfortheidentificationofarobustsubsetoffeaturesinhighdimensionaldatasets AT colombogualtieroi garsgeneticalgorithmfortheidentificationofarobustsubsetoffeaturesinhighdimensionaldatasets AT piacentiniluca garsgeneticalgorithmfortheidentificationofarobustsubsetoffeaturesinhighdimensionaldatasets |