Cargando…

Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes

BACKGROUND: In the clinical context, samples assayed by microarray are often classified by cell line or tumour type and it is of interest to discover a set of genes that can be used as class predictors. The leukemia dataset of Golub et al. [1] and the NCI60 dataset of Ross et al. [2] present multicl...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jirapech-Umpai, Thanyaluk, Aitken, Stuart
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2005
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1181625/ https://www.ncbi.nlm.nih.gov/pubmed/15958165 http://dx.doi.org/10.1186/1471-2105-6-148

_version_	1782124629967503360
author	Jirapech-Umpai, Thanyaluk Aitken, Stuart
author_facet	Jirapech-Umpai, Thanyaluk Aitken, Stuart
author_sort	Jirapech-Umpai, Thanyaluk
collection	PubMed
description	BACKGROUND: In the clinical context, samples assayed by microarray are often classified by cell line or tumour type and it is of interest to discover a set of genes that can be used as class predictors. The leukemia dataset of Golub et al. [1] and the NCI60 dataset of Ross et al. [2] present multiclass classification problems where three tumour types and nine cell lines respectively must be identified. We apply an evolutionary algorithm to identify the near-optimal set of predictive genes that classify the data. We also examine the initial gene selection step whereby the most informative genes are selected from the genes assayed. RESULTS: In the absence of feature selection, classification accuracy on the training data is typically good, but not replicated on the testing data. Gene selection using the RankGene software [3] is shown to significantly improve performance on the testing data. Further, we show that the choice of feature selection criteria can have a significant effect on accuracy. The evolutionary algorithm is shown to perform stably across the space of possible parameter settings – indicating the robustness of the approach. We assess performance using a low variance estimation technique, and present an analysis of the genes most often selected as predictors. CONCLUSION: The computational methods we have developed perform robustly and accurately, and yield results in accord with clinical knowledge: A Z-score analysis of the genes most frequently selected identifies genes known to discriminate AML and Pre-T ALL leukemia. This study also confirms that significantly different sets of genes are found to be most discriminatory as the sample classes are refined.
format	Text
id	pubmed-1181625
institution	National Center for Biotechnology Information
language	English
publishDate	2005
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-11816252005-07-30 Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes Jirapech-Umpai, Thanyaluk Aitken, Stuart BMC Bioinformatics Research Article BACKGROUND: In the clinical context, samples assayed by microarray are often classified by cell line or tumour type and it is of interest to discover a set of genes that can be used as class predictors. The leukemia dataset of Golub et al. [1] and the NCI60 dataset of Ross et al. [2] present multiclass classification problems where three tumour types and nine cell lines respectively must be identified. We apply an evolutionary algorithm to identify the near-optimal set of predictive genes that classify the data. We also examine the initial gene selection step whereby the most informative genes are selected from the genes assayed. RESULTS: In the absence of feature selection, classification accuracy on the training data is typically good, but not replicated on the testing data. Gene selection using the RankGene software [3] is shown to significantly improve performance on the testing data. Further, we show that the choice of feature selection criteria can have a significant effect on accuracy. The evolutionary algorithm is shown to perform stably across the space of possible parameter settings – indicating the robustness of the approach. We assess performance using a low variance estimation technique, and present an analysis of the genes most often selected as predictors. CONCLUSION: The computational methods we have developed perform robustly and accurately, and yield results in accord with clinical knowledge: A Z-score analysis of the genes most frequently selected identifies genes known to discriminate AML and Pre-T ALL leukemia. This study also confirms that significantly different sets of genes are found to be most discriminatory as the sample classes are refined. BioMed Central 2005-06-15 /pmc/articles/PMC1181625/ /pubmed/15958165 http://dx.doi.org/10.1186/1471-2105-6-148 Text en Copyright © 2005 Jirapech-Umpai and Aitken; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Jirapech-Umpai, Thanyaluk Aitken, Stuart Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes
title	Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes
title_full	Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes
title_fullStr	Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes
title_full_unstemmed	Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes
title_short	Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes
title_sort	feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1181625/ https://www.ncbi.nlm.nih.gov/pubmed/15958165 http://dx.doi.org/10.1186/1471-2105-6-148
work_keys_str_mv	AT jirapechumpaithanyaluk featureselectionandclassificationformicroarraydataanalysisevolutionarymethodsforidentifyingpredictivegenes AT aitkenstuart featureselectionandclassificationformicroarraydataanalysisevolutionarymethodsforidentifyingpredictivegenes

Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes

Ejemplares similares