Cargando…
Informative gene selection and the direct classification of tumors based on relative simplicity
BACKGROUND: Selecting a parsimonious set of informative genes to build highly generalized performance classifier is the most important task for the analysis of tumor microarray expression data. Many existing gene pair evaluation methods cannot highlight diverse patterns of gene pairs only used one s...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4721022/ https://www.ncbi.nlm.nih.gov/pubmed/26792270 http://dx.doi.org/10.1186/s12859-016-0893-0 |
_version_ | 1782411162492600320 |
---|---|
author | Chen, Yuan Wang, Lifeng Li, Lanzhi Zhang, Hongyan Yuan, Zheming |
author_facet | Chen, Yuan Wang, Lifeng Li, Lanzhi Zhang, Hongyan Yuan, Zheming |
author_sort | Chen, Yuan |
collection | PubMed |
description | BACKGROUND: Selecting a parsimonious set of informative genes to build highly generalized performance classifier is the most important task for the analysis of tumor microarray expression data. Many existing gene pair evaluation methods cannot highlight diverse patterns of gene pairs only used one strategy of vertical comparison and horizontal comparison, while individual-gene-ranking method ignores redundancy and synergy among genes. RESULTS: Here we proposed a novel score measure named relative simplicity (RS). We evaluated gene pairs according to integrating vertical comparison with horizontal comparison, finally built RS-based direct classifier (RS-based DC) based on a set of informative genes capable of binary discrimination with a paired votes strategy. Nine multi-class gene expression datasets involving human cancers were used to validate the performance of new method. Compared with the nine reference models, RS-based DC received the highest average independent test accuracy (91.40 %), the best generalization performance and the smallest informative average gene number (20.56). Compared with the four reference feature selection methods, RS also received the highest average test accuracy in three classifiers (Naïve Bayes, k-Nearest Neighbor and Support Vector Machine), and only RS can improve the performance of SVM. CONCLUSIONS: Diverse patterns of gene pairs could be highlighted more fully while integrating vertical comparison with horizontal comparison strategy. DC core classifier can effectively control over-fitting. RS-based feature selection method combined with DC classifier can lead to more robust selection of informative genes and classification accuracy. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0893-0) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4721022 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-47210222016-01-22 Informative gene selection and the direct classification of tumors based on relative simplicity Chen, Yuan Wang, Lifeng Li, Lanzhi Zhang, Hongyan Yuan, Zheming BMC Bioinformatics Research Article BACKGROUND: Selecting a parsimonious set of informative genes to build highly generalized performance classifier is the most important task for the analysis of tumor microarray expression data. Many existing gene pair evaluation methods cannot highlight diverse patterns of gene pairs only used one strategy of vertical comparison and horizontal comparison, while individual-gene-ranking method ignores redundancy and synergy among genes. RESULTS: Here we proposed a novel score measure named relative simplicity (RS). We evaluated gene pairs according to integrating vertical comparison with horizontal comparison, finally built RS-based direct classifier (RS-based DC) based on a set of informative genes capable of binary discrimination with a paired votes strategy. Nine multi-class gene expression datasets involving human cancers were used to validate the performance of new method. Compared with the nine reference models, RS-based DC received the highest average independent test accuracy (91.40 %), the best generalization performance and the smallest informative average gene number (20.56). Compared with the four reference feature selection methods, RS also received the highest average test accuracy in three classifiers (Naïve Bayes, k-Nearest Neighbor and Support Vector Machine), and only RS can improve the performance of SVM. CONCLUSIONS: Diverse patterns of gene pairs could be highlighted more fully while integrating vertical comparison with horizontal comparison strategy. DC core classifier can effectively control over-fitting. RS-based feature selection method combined with DC classifier can lead to more robust selection of informative genes and classification accuracy. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0893-0) contains supplementary material, which is available to authorized users. BioMed Central 2016-01-20 /pmc/articles/PMC4721022/ /pubmed/26792270 http://dx.doi.org/10.1186/s12859-016-0893-0 Text en © Chen et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Chen, Yuan Wang, Lifeng Li, Lanzhi Zhang, Hongyan Yuan, Zheming Informative gene selection and the direct classification of tumors based on relative simplicity |
title | Informative gene selection and the direct classification of tumors based on relative simplicity |
title_full | Informative gene selection and the direct classification of tumors based on relative simplicity |
title_fullStr | Informative gene selection and the direct classification of tumors based on relative simplicity |
title_full_unstemmed | Informative gene selection and the direct classification of tumors based on relative simplicity |
title_short | Informative gene selection and the direct classification of tumors based on relative simplicity |
title_sort | informative gene selection and the direct classification of tumors based on relative simplicity |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4721022/ https://www.ncbi.nlm.nih.gov/pubmed/26792270 http://dx.doi.org/10.1186/s12859-016-0893-0 |
work_keys_str_mv | AT chenyuan informativegeneselectionandthedirectclassificationoftumorsbasedonrelativesimplicity AT wanglifeng informativegeneselectionandthedirectclassificationoftumorsbasedonrelativesimplicity AT lilanzhi informativegeneselectionandthedirectclassificationoftumorsbasedonrelativesimplicity AT zhanghongyan informativegeneselectionandthedirectclassificationoftumorsbasedonrelativesimplicity AT yuanzheming informativegeneselectionandthedirectclassificationoftumorsbasedonrelativesimplicity |