Cargando…
Feature Selection and Molecular Classification of Cancer Phenotypes: A Comparative Study
The classification of high dimensional gene expression data is key to the development of effective diagnostic and prognostic tools. Feature selection involves finding the best subset with the highest power in predicting class labels. Here, we conducted a comparative study focused on different combin...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9408964/ https://www.ncbi.nlm.nih.gov/pubmed/36012350 http://dx.doi.org/10.3390/ijms23169087 |
_version_ | 1784774732429656064 |
---|---|
author | Zanella, Luca Facco, Pierantonio Bezzo, Fabrizio Cimetta, Elisa |
author_facet | Zanella, Luca Facco, Pierantonio Bezzo, Fabrizio Cimetta, Elisa |
author_sort | Zanella, Luca |
collection | PubMed |
description | The classification of high dimensional gene expression data is key to the development of effective diagnostic and prognostic tools. Feature selection involves finding the best subset with the highest power in predicting class labels. Here, we conducted a comparative study focused on different combinations of feature selectors (Chi-Squared, mRMR, Relief-F, and Genetic Algorithms) and classification learning algorithms (Random Forests, PLS-DA, SVM, Regularized Logistic/Multinomial Regression, and kNN) to identify those with the best predictive capacity. The performance of each combination is evaluated through an empirical study on three benchmark cancer-related microarray datasets. Our results first suggest that the quality of the data relevant to the target classes is key for the successful classification of cancer phenotypes. We also proved that, for a given classification learning algorithm and dataset, all filters have a similar performance. Interestingly, filters achieve comparable or even better results with respect to the GA-based wrappers, while also being easier and faster to implement. Taken together, our findings suggest that simple, well-established feature selectors in combination with optimized classifiers guarantee good performances, with no need for complicated and computationally demanding methodologies. |
format | Online Article Text |
id | pubmed-9408964 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-94089642022-08-26 Feature Selection and Molecular Classification of Cancer Phenotypes: A Comparative Study Zanella, Luca Facco, Pierantonio Bezzo, Fabrizio Cimetta, Elisa Int J Mol Sci Article The classification of high dimensional gene expression data is key to the development of effective diagnostic and prognostic tools. Feature selection involves finding the best subset with the highest power in predicting class labels. Here, we conducted a comparative study focused on different combinations of feature selectors (Chi-Squared, mRMR, Relief-F, and Genetic Algorithms) and classification learning algorithms (Random Forests, PLS-DA, SVM, Regularized Logistic/Multinomial Regression, and kNN) to identify those with the best predictive capacity. The performance of each combination is evaluated through an empirical study on three benchmark cancer-related microarray datasets. Our results first suggest that the quality of the data relevant to the target classes is key for the successful classification of cancer phenotypes. We also proved that, for a given classification learning algorithm and dataset, all filters have a similar performance. Interestingly, filters achieve comparable or even better results with respect to the GA-based wrappers, while also being easier and faster to implement. Taken together, our findings suggest that simple, well-established feature selectors in combination with optimized classifiers guarantee good performances, with no need for complicated and computationally demanding methodologies. MDPI 2022-08-13 /pmc/articles/PMC9408964/ /pubmed/36012350 http://dx.doi.org/10.3390/ijms23169087 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Zanella, Luca Facco, Pierantonio Bezzo, Fabrizio Cimetta, Elisa Feature Selection and Molecular Classification of Cancer Phenotypes: A Comparative Study |
title | Feature Selection and Molecular Classification of Cancer Phenotypes: A Comparative Study |
title_full | Feature Selection and Molecular Classification of Cancer Phenotypes: A Comparative Study |
title_fullStr | Feature Selection and Molecular Classification of Cancer Phenotypes: A Comparative Study |
title_full_unstemmed | Feature Selection and Molecular Classification of Cancer Phenotypes: A Comparative Study |
title_short | Feature Selection and Molecular Classification of Cancer Phenotypes: A Comparative Study |
title_sort | feature selection and molecular classification of cancer phenotypes: a comparative study |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9408964/ https://www.ncbi.nlm.nih.gov/pubmed/36012350 http://dx.doi.org/10.3390/ijms23169087 |
work_keys_str_mv | AT zanellaluca featureselectionandmolecularclassificationofcancerphenotypesacomparativestudy AT faccopierantonio featureselectionandmolecularclassificationofcancerphenotypesacomparativestudy AT bezzofabrizio featureselectionandmolecularclassificationofcancerphenotypesacomparativestudy AT cimettaelisa featureselectionandmolecularclassificationofcancerphenotypesacomparativestudy |