Cargando…

Improving accuracy for cancer classification with a new algorithm for genes selection

BACKGROUND: Even though the classification of cancer tissue samples based on gene expression data has advanced considerably in recent years, it faces great challenges to improve accuracy. One of the challenges is to establish an effective method that can select a parsimonious set of relevant genes....

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Hongyan, Wang, Haiyan, Dai, Zhijun, Chen, Ming-shun, Yuan, Zheming
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3562261/ https://www.ncbi.nlm.nih.gov/pubmed/23148517 http://dx.doi.org/10.1186/1471-2105-13-298

_version_	1782258068757676032
author	Zhang, Hongyan Wang, Haiyan Dai, Zhijun Chen, Ming-shun Yuan, Zheming
author_facet	Zhang, Hongyan Wang, Haiyan Dai, Zhijun Chen, Ming-shun Yuan, Zheming
author_sort	Zhang, Hongyan
collection	PubMed
description	BACKGROUND: Even though the classification of cancer tissue samples based on gene expression data has advanced considerably in recent years, it faces great challenges to improve accuracy. One of the challenges is to establish an effective method that can select a parsimonious set of relevant genes. So far, most methods for gene selection in literature focus on screening individual or pairs of genes without considering the possible interactions among genes. Here we introduce a new computational method named the Binary Matrix Shuffling Filter (BMSF). It not only overcomes the difficulty associated with the search schemes of traditional wrapper methods and overfitting problem in large dimensional search space but also takes potential gene interactions into account during gene selection. This method, coupled with Support Vector Machine (SVM) for implementation, often selects very small number of genes for easy model interpretability. RESULTS: We applied our method to 9 two-class gene expression datasets involving human cancers. During the gene selection process, the set of genes to be kept in the model was recursively refined and repeatedly updated according to the effect of a given gene on the contributions of other genes in reference to their usefulness in cancer classification. The small number of informative genes selected from each dataset leads to significantly improved leave-one-out (LOOCV) classification accuracy across all 9 datasets for multiple classifiers. Our method also exhibits broad generalization in the genes selected since multiple commonly used classifiers achieved either equivalent or much higher LOOCV accuracy than those reported in literature. CONCLUSIONS: Evaluation of a gene’s contribution to binary cancer classification is better to be considered after adjusting for the joint effect of a large number of other genes. A computationally efficient search scheme was provided to perform effective search in the extensive feature space that includes possible interactions of many genes. Performance of the algorithm applied to 9 datasets suggests that it is possible to improve the accuracy of cancer classification by a big margin when joint effects of many genes are considered.
format	Online Article Text
id	pubmed-3562261
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35622612013-02-05 Improving accuracy for cancer classification with a new algorithm for genes selection Zhang, Hongyan Wang, Haiyan Dai, Zhijun Chen, Ming-shun Yuan, Zheming BMC Bioinformatics Methodology Article BACKGROUND: Even though the classification of cancer tissue samples based on gene expression data has advanced considerably in recent years, it faces great challenges to improve accuracy. One of the challenges is to establish an effective method that can select a parsimonious set of relevant genes. So far, most methods for gene selection in literature focus on screening individual or pairs of genes without considering the possible interactions among genes. Here we introduce a new computational method named the Binary Matrix Shuffling Filter (BMSF). It not only overcomes the difficulty associated with the search schemes of traditional wrapper methods and overfitting problem in large dimensional search space but also takes potential gene interactions into account during gene selection. This method, coupled with Support Vector Machine (SVM) for implementation, often selects very small number of genes for easy model interpretability. RESULTS: We applied our method to 9 two-class gene expression datasets involving human cancers. During the gene selection process, the set of genes to be kept in the model was recursively refined and repeatedly updated according to the effect of a given gene on the contributions of other genes in reference to their usefulness in cancer classification. The small number of informative genes selected from each dataset leads to significantly improved leave-one-out (LOOCV) classification accuracy across all 9 datasets for multiple classifiers. Our method also exhibits broad generalization in the genes selected since multiple commonly used classifiers achieved either equivalent or much higher LOOCV accuracy than those reported in literature. CONCLUSIONS: Evaluation of a gene’s contribution to binary cancer classification is better to be considered after adjusting for the joint effect of a large number of other genes. A computationally efficient search scheme was provided to perform effective search in the extensive feature space that includes possible interactions of many genes. Performance of the algorithm applied to 9 datasets suggests that it is possible to improve the accuracy of cancer classification by a big margin when joint effects of many genes are considered. BioMed Central 2012-11-13 /pmc/articles/PMC3562261/ /pubmed/23148517 http://dx.doi.org/10.1186/1471-2105-13-298 Text en Copyright ©2012 Zhang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Zhang, Hongyan Wang, Haiyan Dai, Zhijun Chen, Ming-shun Yuan, Zheming Improving accuracy for cancer classification with a new algorithm for genes selection
title	Improving accuracy for cancer classification with a new algorithm for genes selection
title_full	Improving accuracy for cancer classification with a new algorithm for genes selection
title_fullStr	Improving accuracy for cancer classification with a new algorithm for genes selection
title_full_unstemmed	Improving accuracy for cancer classification with a new algorithm for genes selection
title_short	Improving accuracy for cancer classification with a new algorithm for genes selection
title_sort	improving accuracy for cancer classification with a new algorithm for genes selection
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3562261/ https://www.ncbi.nlm.nih.gov/pubmed/23148517 http://dx.doi.org/10.1186/1471-2105-13-298
work_keys_str_mv	AT zhanghongyan improvingaccuracyforcancerclassificationwithanewalgorithmforgenesselection AT wanghaiyan improvingaccuracyforcancerclassificationwithanewalgorithmforgenesselection AT daizhijun improvingaccuracyforcancerclassificationwithanewalgorithmforgenesselection AT chenmingshun improvingaccuracyforcancerclassificationwithanewalgorithmforgenesselection AT yuanzheming improvingaccuracyforcancerclassificationwithanewalgorithmforgenesselection

Improving accuracy for cancer classification with a new algorithm for genes selection

Ejemplares similares