Cargando…

Feature Selection and Molecular Classification of Cancer Phenotypes: A Comparative Study

The classification of high dimensional gene expression data is key to the development of effective diagnostic and prognostic tools. Feature selection involves finding the best subset with the highest power in predicting class labels. Here, we conducted a comparative study focused on different combin...

Descripción completa

Detalles Bibliográficos
Autores principales: Zanella, Luca, Facco, Pierantonio, Bezzo, Fabrizio, Cimetta, Elisa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9408964/
https://www.ncbi.nlm.nih.gov/pubmed/36012350
http://dx.doi.org/10.3390/ijms23169087
_version_ 1784774732429656064
author Zanella, Luca
Facco, Pierantonio
Bezzo, Fabrizio
Cimetta, Elisa
author_facet Zanella, Luca
Facco, Pierantonio
Bezzo, Fabrizio
Cimetta, Elisa
author_sort Zanella, Luca
collection PubMed
description The classification of high dimensional gene expression data is key to the development of effective diagnostic and prognostic tools. Feature selection involves finding the best subset with the highest power in predicting class labels. Here, we conducted a comparative study focused on different combinations of feature selectors (Chi-Squared, mRMR, Relief-F, and Genetic Algorithms) and classification learning algorithms (Random Forests, PLS-DA, SVM, Regularized Logistic/Multinomial Regression, and kNN) to identify those with the best predictive capacity. The performance of each combination is evaluated through an empirical study on three benchmark cancer-related microarray datasets. Our results first suggest that the quality of the data relevant to the target classes is key for the successful classification of cancer phenotypes. We also proved that, for a given classification learning algorithm and dataset, all filters have a similar performance. Interestingly, filters achieve comparable or even better results with respect to the GA-based wrappers, while also being easier and faster to implement. Taken together, our findings suggest that simple, well-established feature selectors in combination with optimized classifiers guarantee good performances, with no need for complicated and computationally demanding methodologies.
format Online
Article
Text
id pubmed-9408964
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-94089642022-08-26 Feature Selection and Molecular Classification of Cancer Phenotypes: A Comparative Study Zanella, Luca Facco, Pierantonio Bezzo, Fabrizio Cimetta, Elisa Int J Mol Sci Article The classification of high dimensional gene expression data is key to the development of effective diagnostic and prognostic tools. Feature selection involves finding the best subset with the highest power in predicting class labels. Here, we conducted a comparative study focused on different combinations of feature selectors (Chi-Squared, mRMR, Relief-F, and Genetic Algorithms) and classification learning algorithms (Random Forests, PLS-DA, SVM, Regularized Logistic/Multinomial Regression, and kNN) to identify those with the best predictive capacity. The performance of each combination is evaluated through an empirical study on three benchmark cancer-related microarray datasets. Our results first suggest that the quality of the data relevant to the target classes is key for the successful classification of cancer phenotypes. We also proved that, for a given classification learning algorithm and dataset, all filters have a similar performance. Interestingly, filters achieve comparable or even better results with respect to the GA-based wrappers, while also being easier and faster to implement. Taken together, our findings suggest that simple, well-established feature selectors in combination with optimized classifiers guarantee good performances, with no need for complicated and computationally demanding methodologies. MDPI 2022-08-13 /pmc/articles/PMC9408964/ /pubmed/36012350 http://dx.doi.org/10.3390/ijms23169087 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zanella, Luca
Facco, Pierantonio
Bezzo, Fabrizio
Cimetta, Elisa
Feature Selection and Molecular Classification of Cancer Phenotypes: A Comparative Study
title Feature Selection and Molecular Classification of Cancer Phenotypes: A Comparative Study
title_full Feature Selection and Molecular Classification of Cancer Phenotypes: A Comparative Study
title_fullStr Feature Selection and Molecular Classification of Cancer Phenotypes: A Comparative Study
title_full_unstemmed Feature Selection and Molecular Classification of Cancer Phenotypes: A Comparative Study
title_short Feature Selection and Molecular Classification of Cancer Phenotypes: A Comparative Study
title_sort feature selection and molecular classification of cancer phenotypes: a comparative study
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9408964/
https://www.ncbi.nlm.nih.gov/pubmed/36012350
http://dx.doi.org/10.3390/ijms23169087
work_keys_str_mv AT zanellaluca featureselectionandmolecularclassificationofcancerphenotypesacomparativestudy
AT faccopierantonio featureselectionandmolecularclassificationofcancerphenotypesacomparativestudy
AT bezzofabrizio featureselectionandmolecularclassificationofcancerphenotypesacomparativestudy
AT cimettaelisa featureselectionandmolecularclassificationofcancerphenotypesacomparativestudy