Cargando…

An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data

BACKGROUND: We consider both univariate- and multivariate-based feature selection for the problem of binary classification with microarray data. The idea is to determine whether the more sophisticated multivariate approach leads to better misclassification error rates because of the potential to con...

Descripción completa

Detalles Bibliográficos
Autores principales: Lecocke, Michael, Hess, Kenneth
Formato: Texto
Lenguaje:English
Publicado: Libertas Academica 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675488/
https://www.ncbi.nlm.nih.gov/pubmed/19458774
_version_ 1782166699394465792
author Lecocke, Michael
Hess, Kenneth
author_facet Lecocke, Michael
Hess, Kenneth
author_sort Lecocke, Michael
collection PubMed
description BACKGROUND: We consider both univariate- and multivariate-based feature selection for the problem of binary classification with microarray data. The idea is to determine whether the more sophisticated multivariate approach leads to better misclassification error rates because of the potential to consider jointly significant subsets of genes (but without overfitting the data). METHODS: We present an empirical study in which 10-fold cross-validation is applied externally to both a univariate-based and two multivariate- (genetic algorithm (GA)-) based feature selection processes. These procedures are applied with respect to three supervised learning algorithms and six published two-class microarray datasets. RESULTS: Considering all datasets, and learning algorithms, the average 10-fold external cross-validation error rates for the univariate-, single-stage GA-, and two-stage GA-based processes are 14.2%, 14.6%, and 14.2%, respectively. We also find that the optimism bias estimates from the GA analyses were half that of the univariate approach, but the selection bias estimates from the GA analyses were 2.5 times that of the univariate results. CONCLUSIONS: We find that the 10-fold external cross-validation misclassification error rates were very comparable. Further, we find that a two-stage GA approach did not demonstrate a significant advantage over a 1-stage approach. We also find that the univariate approach had higher optimism bias and lower selection bias compared to both GA approaches.
format Text
id pubmed-2675488
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-26754882009-05-20 An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data Lecocke, Michael Hess, Kenneth Cancer Inform Original Research BACKGROUND: We consider both univariate- and multivariate-based feature selection for the problem of binary classification with microarray data. The idea is to determine whether the more sophisticated multivariate approach leads to better misclassification error rates because of the potential to consider jointly significant subsets of genes (but without overfitting the data). METHODS: We present an empirical study in which 10-fold cross-validation is applied externally to both a univariate-based and two multivariate- (genetic algorithm (GA)-) based feature selection processes. These procedures are applied with respect to three supervised learning algorithms and six published two-class microarray datasets. RESULTS: Considering all datasets, and learning algorithms, the average 10-fold external cross-validation error rates for the univariate-, single-stage GA-, and two-stage GA-based processes are 14.2%, 14.6%, and 14.2%, respectively. We also find that the optimism bias estimates from the GA analyses were half that of the univariate approach, but the selection bias estimates from the GA analyses were 2.5 times that of the univariate results. CONCLUSIONS: We find that the 10-fold external cross-validation misclassification error rates were very comparable. Further, we find that a two-stage GA approach did not demonstrate a significant advantage over a 1-stage approach. We also find that the univariate approach had higher optimism bias and lower selection bias compared to both GA approaches. Libertas Academica 2007-02-23 /pmc/articles/PMC2675488/ /pubmed/19458774 Text en © 2006 The authors.
spellingShingle Original Research
Lecocke, Michael
Hess, Kenneth
An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data
title An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data
title_full An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data
title_fullStr An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data
title_full_unstemmed An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data
title_short An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data
title_sort empirical study of univariate and genetic algorithm-based feature selection in binary classification with microarray data
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675488/
https://www.ncbi.nlm.nih.gov/pubmed/19458774
work_keys_str_mv AT lecockemichael anempiricalstudyofunivariateandgeneticalgorithmbasedfeatureselectioninbinaryclassificationwithmicroarraydata
AT hesskenneth anempiricalstudyofunivariateandgeneticalgorithmbasedfeatureselectioninbinaryclassificationwithmicroarraydata
AT lecockemichael empiricalstudyofunivariateandgeneticalgorithmbasedfeatureselectioninbinaryclassificationwithmicroarraydata
AT hesskenneth empiricalstudyofunivariateandgeneticalgorithmbasedfeatureselectioninbinaryclassificationwithmicroarraydata