Cargando…

Informative gene selection and the direct classification of tumors based on relative simplicity

BACKGROUND: Selecting a parsimonious set of informative genes to build highly generalized performance classifier is the most important task for the analysis of tumor microarray expression data. Many existing gene pair evaluation methods cannot highlight diverse patterns of gene pairs only used one s...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Yuan, Wang, Lifeng, Li, Lanzhi, Zhang, Hongyan, Yuan, Zheming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4721022/
https://www.ncbi.nlm.nih.gov/pubmed/26792270
http://dx.doi.org/10.1186/s12859-016-0893-0
_version_ 1782411162492600320
author Chen, Yuan
Wang, Lifeng
Li, Lanzhi
Zhang, Hongyan
Yuan, Zheming
author_facet Chen, Yuan
Wang, Lifeng
Li, Lanzhi
Zhang, Hongyan
Yuan, Zheming
author_sort Chen, Yuan
collection PubMed
description BACKGROUND: Selecting a parsimonious set of informative genes to build highly generalized performance classifier is the most important task for the analysis of tumor microarray expression data. Many existing gene pair evaluation methods cannot highlight diverse patterns of gene pairs only used one strategy of vertical comparison and horizontal comparison, while individual-gene-ranking method ignores redundancy and synergy among genes. RESULTS: Here we proposed a novel score measure named relative simplicity (RS). We evaluated gene pairs according to integrating vertical comparison with horizontal comparison, finally built RS-based direct classifier (RS-based DC) based on a set of informative genes capable of binary discrimination with a paired votes strategy. Nine multi-class gene expression datasets involving human cancers were used to validate the performance of new method. Compared with the nine reference models, RS-based DC received the highest average independent test accuracy (91.40 %), the best generalization performance and the smallest informative average gene number (20.56). Compared with the four reference feature selection methods, RS also received the highest average test accuracy in three classifiers (Naïve Bayes, k-Nearest Neighbor and Support Vector Machine), and only RS can improve the performance of SVM. CONCLUSIONS: Diverse patterns of gene pairs could be highlighted more fully while integrating vertical comparison with horizontal comparison strategy. DC core classifier can effectively control over-fitting. RS-based feature selection method combined with DC classifier can lead to more robust selection of informative genes and classification accuracy. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0893-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4721022
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47210222016-01-22 Informative gene selection and the direct classification of tumors based on relative simplicity Chen, Yuan Wang, Lifeng Li, Lanzhi Zhang, Hongyan Yuan, Zheming BMC Bioinformatics Research Article BACKGROUND: Selecting a parsimonious set of informative genes to build highly generalized performance classifier is the most important task for the analysis of tumor microarray expression data. Many existing gene pair evaluation methods cannot highlight diverse patterns of gene pairs only used one strategy of vertical comparison and horizontal comparison, while individual-gene-ranking method ignores redundancy and synergy among genes. RESULTS: Here we proposed a novel score measure named relative simplicity (RS). We evaluated gene pairs according to integrating vertical comparison with horizontal comparison, finally built RS-based direct classifier (RS-based DC) based on a set of informative genes capable of binary discrimination with a paired votes strategy. Nine multi-class gene expression datasets involving human cancers were used to validate the performance of new method. Compared with the nine reference models, RS-based DC received the highest average independent test accuracy (91.40 %), the best generalization performance and the smallest informative average gene number (20.56). Compared with the four reference feature selection methods, RS also received the highest average test accuracy in three classifiers (Naïve Bayes, k-Nearest Neighbor and Support Vector Machine), and only RS can improve the performance of SVM. CONCLUSIONS: Diverse patterns of gene pairs could be highlighted more fully while integrating vertical comparison with horizontal comparison strategy. DC core classifier can effectively control over-fitting. RS-based feature selection method combined with DC classifier can lead to more robust selection of informative genes and classification accuracy. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0893-0) contains supplementary material, which is available to authorized users. BioMed Central 2016-01-20 /pmc/articles/PMC4721022/ /pubmed/26792270 http://dx.doi.org/10.1186/s12859-016-0893-0 Text en © Chen et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Chen, Yuan
Wang, Lifeng
Li, Lanzhi
Zhang, Hongyan
Yuan, Zheming
Informative gene selection and the direct classification of tumors based on relative simplicity
title Informative gene selection and the direct classification of tumors based on relative simplicity
title_full Informative gene selection and the direct classification of tumors based on relative simplicity
title_fullStr Informative gene selection and the direct classification of tumors based on relative simplicity
title_full_unstemmed Informative gene selection and the direct classification of tumors based on relative simplicity
title_short Informative gene selection and the direct classification of tumors based on relative simplicity
title_sort informative gene selection and the direct classification of tumors based on relative simplicity
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4721022/
https://www.ncbi.nlm.nih.gov/pubmed/26792270
http://dx.doi.org/10.1186/s12859-016-0893-0
work_keys_str_mv AT chenyuan informativegeneselectionandthedirectclassificationoftumorsbasedonrelativesimplicity
AT wanglifeng informativegeneselectionandthedirectclassificationoftumorsbasedonrelativesimplicity
AT lilanzhi informativegeneselectionandthedirectclassificationoftumorsbasedonrelativesimplicity
AT zhanghongyan informativegeneselectionandthedirectclassificationoftumorsbasedonrelativesimplicity
AT yuanzheming informativegeneselectionandthedirectclassificationoftumorsbasedonrelativesimplicity