Cargando…
Dimension reduction with redundant gene elimination for tumor classification
BACKGROUND: Analysis of gene expression data for tumor classification is an important application of bioinformatics methods. But it is hard to analyse gene expression data from DNA microarray experiments by commonly used classifiers, because there are only a few observations but with thousands of me...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2423430/ https://www.ncbi.nlm.nih.gov/pubmed/18541061 http://dx.doi.org/10.1186/1471-2105-9-S6-S8 |
_version_ | 1782156096925859840 |
---|---|
author | Zeng, Xue-Qiang Li, Guo-Zheng Yang, Jack Y Yang, Mary Qu Wu, Geng-Feng |
author_facet | Zeng, Xue-Qiang Li, Guo-Zheng Yang, Jack Y Yang, Mary Qu Wu, Geng-Feng |
author_sort | Zeng, Xue-Qiang |
collection | PubMed |
description | BACKGROUND: Analysis of gene expression data for tumor classification is an important application of bioinformatics methods. But it is hard to analyse gene expression data from DNA microarray experiments by commonly used classifiers, because there are only a few observations but with thousands of measured genes in the data set. Dimension reduction is often used to handle such a high dimensional problem, but it is obscured by the existence of amounts of redundant features in the microarray data set. RESULTS: Dimension reduction is performed by combing feature extraction with redundant gene elimination for tumor classification. A novel metric of redundancy based on DIScriminative Contribution (DISC) is proposed which estimates the feature similarity by explicitly building a linear classifier on each gene. Compared with the standard linear correlation metric, DISC takes the label information into account and directly estimates the redundancy of the discriminative ability of two given features. Based on the DISC metric, a novel algorithm named REDISC (Redundancy Elimination based on Discriminative Contribution) is proposed, which eliminates redundant genes before feature extraction and promotes performance of dimension reduction. Experimental results on two microarray data sets show that the REDISC algorithm is effective and reliable to improve generalization performance of dimension reduction and hence the used classifier. CONCLUSION: Dimension reduction by performing redundant gene elimination before feature extraction is better than that with only feature extraction for tumor classification, and redundant gene elimination in a supervised way is superior to the commonly used unsupervised method like linear correlation coefficients. |
format | Text |
id | pubmed-2423430 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-24234302008-06-11 Dimension reduction with redundant gene elimination for tumor classification Zeng, Xue-Qiang Li, Guo-Zheng Yang, Jack Y Yang, Mary Qu Wu, Geng-Feng BMC Bioinformatics Research BACKGROUND: Analysis of gene expression data for tumor classification is an important application of bioinformatics methods. But it is hard to analyse gene expression data from DNA microarray experiments by commonly used classifiers, because there are only a few observations but with thousands of measured genes in the data set. Dimension reduction is often used to handle such a high dimensional problem, but it is obscured by the existence of amounts of redundant features in the microarray data set. RESULTS: Dimension reduction is performed by combing feature extraction with redundant gene elimination for tumor classification. A novel metric of redundancy based on DIScriminative Contribution (DISC) is proposed which estimates the feature similarity by explicitly building a linear classifier on each gene. Compared with the standard linear correlation metric, DISC takes the label information into account and directly estimates the redundancy of the discriminative ability of two given features. Based on the DISC metric, a novel algorithm named REDISC (Redundancy Elimination based on Discriminative Contribution) is proposed, which eliminates redundant genes before feature extraction and promotes performance of dimension reduction. Experimental results on two microarray data sets show that the REDISC algorithm is effective and reliable to improve generalization performance of dimension reduction and hence the used classifier. CONCLUSION: Dimension reduction by performing redundant gene elimination before feature extraction is better than that with only feature extraction for tumor classification, and redundant gene elimination in a supervised way is superior to the commonly used unsupervised method like linear correlation coefficients. BioMed Central 2008-05-28 /pmc/articles/PMC2423430/ /pubmed/18541061 http://dx.doi.org/10.1186/1471-2105-9-S6-S8 Text en Copyright © 2008 Zeng et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Zeng, Xue-Qiang Li, Guo-Zheng Yang, Jack Y Yang, Mary Qu Wu, Geng-Feng Dimension reduction with redundant gene elimination for tumor classification |
title | Dimension reduction with redundant gene elimination for tumor classification |
title_full | Dimension reduction with redundant gene elimination for tumor classification |
title_fullStr | Dimension reduction with redundant gene elimination for tumor classification |
title_full_unstemmed | Dimension reduction with redundant gene elimination for tumor classification |
title_short | Dimension reduction with redundant gene elimination for tumor classification |
title_sort | dimension reduction with redundant gene elimination for tumor classification |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2423430/ https://www.ncbi.nlm.nih.gov/pubmed/18541061 http://dx.doi.org/10.1186/1471-2105-9-S6-S8 |
work_keys_str_mv | AT zengxueqiang dimensionreductionwithredundantgeneeliminationfortumorclassification AT liguozheng dimensionreductionwithredundantgeneeliminationfortumorclassification AT yangjacky dimensionreductionwithredundantgeneeliminationfortumorclassification AT yangmaryqu dimensionreductionwithredundantgeneeliminationfortumorclassification AT wugengfeng dimensionreductionwithredundantgeneeliminationfortumorclassification |