Cargando…
Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes
BACKGROUND: In the clinical context, samples assayed by microarray are often classified by cell line or tumour type and it is of interest to discover a set of genes that can be used as class predictors. The leukemia dataset of Golub et al. [1] and the NCI60 dataset of Ross et al. [2] present multicl...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2005
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1181625/ https://www.ncbi.nlm.nih.gov/pubmed/15958165 http://dx.doi.org/10.1186/1471-2105-6-148 |
_version_ | 1782124629967503360 |
---|---|
author | Jirapech-Umpai, Thanyaluk Aitken, Stuart |
author_facet | Jirapech-Umpai, Thanyaluk Aitken, Stuart |
author_sort | Jirapech-Umpai, Thanyaluk |
collection | PubMed |
description | BACKGROUND: In the clinical context, samples assayed by microarray are often classified by cell line or tumour type and it is of interest to discover a set of genes that can be used as class predictors. The leukemia dataset of Golub et al. [1] and the NCI60 dataset of Ross et al. [2] present multiclass classification problems where three tumour types and nine cell lines respectively must be identified. We apply an evolutionary algorithm to identify the near-optimal set of predictive genes that classify the data. We also examine the initial gene selection step whereby the most informative genes are selected from the genes assayed. RESULTS: In the absence of feature selection, classification accuracy on the training data is typically good, but not replicated on the testing data. Gene selection using the RankGene software [3] is shown to significantly improve performance on the testing data. Further, we show that the choice of feature selection criteria can have a significant effect on accuracy. The evolutionary algorithm is shown to perform stably across the space of possible parameter settings – indicating the robustness of the approach. We assess performance using a low variance estimation technique, and present an analysis of the genes most often selected as predictors. CONCLUSION: The computational methods we have developed perform robustly and accurately, and yield results in accord with clinical knowledge: A Z-score analysis of the genes most frequently selected identifies genes known to discriminate AML and Pre-T ALL leukemia. This study also confirms that significantly different sets of genes are found to be most discriminatory as the sample classes are refined. |
format | Text |
id | pubmed-1181625 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2005 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-11816252005-07-30 Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes Jirapech-Umpai, Thanyaluk Aitken, Stuart BMC Bioinformatics Research Article BACKGROUND: In the clinical context, samples assayed by microarray are often classified by cell line or tumour type and it is of interest to discover a set of genes that can be used as class predictors. The leukemia dataset of Golub et al. [1] and the NCI60 dataset of Ross et al. [2] present multiclass classification problems where three tumour types and nine cell lines respectively must be identified. We apply an evolutionary algorithm to identify the near-optimal set of predictive genes that classify the data. We also examine the initial gene selection step whereby the most informative genes are selected from the genes assayed. RESULTS: In the absence of feature selection, classification accuracy on the training data is typically good, but not replicated on the testing data. Gene selection using the RankGene software [3] is shown to significantly improve performance on the testing data. Further, we show that the choice of feature selection criteria can have a significant effect on accuracy. The evolutionary algorithm is shown to perform stably across the space of possible parameter settings – indicating the robustness of the approach. We assess performance using a low variance estimation technique, and present an analysis of the genes most often selected as predictors. CONCLUSION: The computational methods we have developed perform robustly and accurately, and yield results in accord with clinical knowledge: A Z-score analysis of the genes most frequently selected identifies genes known to discriminate AML and Pre-T ALL leukemia. This study also confirms that significantly different sets of genes are found to be most discriminatory as the sample classes are refined. BioMed Central 2005-06-15 /pmc/articles/PMC1181625/ /pubmed/15958165 http://dx.doi.org/10.1186/1471-2105-6-148 Text en Copyright © 2005 Jirapech-Umpai and Aitken; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Jirapech-Umpai, Thanyaluk Aitken, Stuart Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes |
title | Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes |
title_full | Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes |
title_fullStr | Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes |
title_full_unstemmed | Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes |
title_short | Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes |
title_sort | feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1181625/ https://www.ncbi.nlm.nih.gov/pubmed/15958165 http://dx.doi.org/10.1186/1471-2105-6-148 |
work_keys_str_mv | AT jirapechumpaithanyaluk featureselectionandclassificationformicroarraydataanalysisevolutionarymethodsforidentifyingpredictivegenes AT aitkenstuart featureselectionandclassificationformicroarraydataanalysisevolutionarymethodsforidentifyingpredictivegenes |