Cargando…

Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification

BACKGROUND: Previous studies on tumor classification based on gene expression profiles suggest that gene selection plays a key role in improving the classification performance. Moreover, finding important tumor-related genes with the highest accuracy is a very important task because these genes migh...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Shu-Lin, Li, Xue-Ling, Fang, Jianwen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3465202/
https://www.ncbi.nlm.nih.gov/pubmed/22830977
http://dx.doi.org/10.1186/1471-2105-13-178
_version_ 1782245526541959168
author Wang, Shu-Lin
Li, Xue-Ling
Fang, Jianwen
author_facet Wang, Shu-Lin
Li, Xue-Ling
Fang, Jianwen
author_sort Wang, Shu-Lin
collection PubMed
description BACKGROUND: Previous studies on tumor classification based on gene expression profiles suggest that gene selection plays a key role in improving the classification performance. Moreover, finding important tumor-related genes with the highest accuracy is a very important task because these genes might serve as tumor biomarkers, which is of great benefit to not only tumor molecular diagnosis but also drug development. RESULTS: This paper proposes a novel gene selection method with rich biomedical meaning based on Heuristic Breadth-first Search Algorithm (HBSA) to find as many optimal gene subsets as possible. Due to the curse of dimensionality, this type of method could suffer from over-fitting and selection bias problems. To address these potential problems, a HBSA-based ensemble classifier is constructed using majority voting strategy from individual classifiers constructed by the selected gene subsets, and a novel HBSA-based gene ranking method is designed to find important tumor-related genes by measuring the significance of genes using their occurrence frequencies in the selected gene subsets. The experimental results on nine tumor datasets including three pairs of cross-platform datasets indicate that the proposed method can not only obtain better generalization performance but also find many important tumor-related genes. CONCLUSIONS: It is found that the frequencies of the selected genes follow a power-law distribution, indicating that only a few top-ranked genes can be used as potential diagnosis biomarkers. Moreover, the top-ranked genes leading to very high prediction accuracy are closely related to specific tumor subtype and even hub genes. Compared with other related methods, the proposed method can achieve higher prediction accuracy with fewer genes. Moreover, they are further justified by analyzing the top-ranked genes in the context of individual gene function, biological pathway, and protein-protein interaction network.
format Online
Article
Text
id pubmed-3465202
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34652022012-10-10 Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification Wang, Shu-Lin Li, Xue-Ling Fang, Jianwen BMC Bioinformatics Research Article BACKGROUND: Previous studies on tumor classification based on gene expression profiles suggest that gene selection plays a key role in improving the classification performance. Moreover, finding important tumor-related genes with the highest accuracy is a very important task because these genes might serve as tumor biomarkers, which is of great benefit to not only tumor molecular diagnosis but also drug development. RESULTS: This paper proposes a novel gene selection method with rich biomedical meaning based on Heuristic Breadth-first Search Algorithm (HBSA) to find as many optimal gene subsets as possible. Due to the curse of dimensionality, this type of method could suffer from over-fitting and selection bias problems. To address these potential problems, a HBSA-based ensemble classifier is constructed using majority voting strategy from individual classifiers constructed by the selected gene subsets, and a novel HBSA-based gene ranking method is designed to find important tumor-related genes by measuring the significance of genes using their occurrence frequencies in the selected gene subsets. The experimental results on nine tumor datasets including three pairs of cross-platform datasets indicate that the proposed method can not only obtain better generalization performance but also find many important tumor-related genes. CONCLUSIONS: It is found that the frequencies of the selected genes follow a power-law distribution, indicating that only a few top-ranked genes can be used as potential diagnosis biomarkers. Moreover, the top-ranked genes leading to very high prediction accuracy are closely related to specific tumor subtype and even hub genes. Compared with other related methods, the proposed method can achieve higher prediction accuracy with fewer genes. Moreover, they are further justified by analyzing the top-ranked genes in the context of individual gene function, biological pathway, and protein-protein interaction network. BioMed Central 2012-07-25 /pmc/articles/PMC3465202/ /pubmed/22830977 http://dx.doi.org/10.1186/1471-2105-13-178 Text en Copyright ©2012 Wang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Wang, Shu-Lin
Li, Xue-Ling
Fang, Jianwen
Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification
title Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification
title_full Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification
title_fullStr Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification
title_full_unstemmed Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification
title_short Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification
title_sort finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3465202/
https://www.ncbi.nlm.nih.gov/pubmed/22830977
http://dx.doi.org/10.1186/1471-2105-13-178
work_keys_str_mv AT wangshulin findingminimumgenesubsetswithheuristicbreadthfirstsearchalgorithmforrobusttumorclassification
AT lixueling findingminimumgenesubsetswithheuristicbreadthfirstsearchalgorithmforrobusttumorclassification
AT fangjianwen findingminimumgenesubsetswithheuristicbreadthfirstsearchalgorithmforrobusttumorclassification