Cargando…

Dimension reduction with redundant gene elimination for tumor classification

BACKGROUND: Analysis of gene expression data for tumor classification is an important application of bioinformatics methods. But it is hard to analyse gene expression data from DNA microarray experiments by commonly used classifiers, because there are only a few observations but with thousands of me...

Descripción completa

Detalles Bibliográficos
Autores principales: Zeng, Xue-Qiang, Li, Guo-Zheng, Yang, Jack Y, Yang, Mary Qu, Wu, Geng-Feng
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2423430/
https://www.ncbi.nlm.nih.gov/pubmed/18541061
http://dx.doi.org/10.1186/1471-2105-9-S6-S8
_version_ 1782156096925859840
author Zeng, Xue-Qiang
Li, Guo-Zheng
Yang, Jack Y
Yang, Mary Qu
Wu, Geng-Feng
author_facet Zeng, Xue-Qiang
Li, Guo-Zheng
Yang, Jack Y
Yang, Mary Qu
Wu, Geng-Feng
author_sort Zeng, Xue-Qiang
collection PubMed
description BACKGROUND: Analysis of gene expression data for tumor classification is an important application of bioinformatics methods. But it is hard to analyse gene expression data from DNA microarray experiments by commonly used classifiers, because there are only a few observations but with thousands of measured genes in the data set. Dimension reduction is often used to handle such a high dimensional problem, but it is obscured by the existence of amounts of redundant features in the microarray data set. RESULTS: Dimension reduction is performed by combing feature extraction with redundant gene elimination for tumor classification. A novel metric of redundancy based on DIScriminative Contribution (DISC) is proposed which estimates the feature similarity by explicitly building a linear classifier on each gene. Compared with the standard linear correlation metric, DISC takes the label information into account and directly estimates the redundancy of the discriminative ability of two given features. Based on the DISC metric, a novel algorithm named REDISC (Redundancy Elimination based on Discriminative Contribution) is proposed, which eliminates redundant genes before feature extraction and promotes performance of dimension reduction. Experimental results on two microarray data sets show that the REDISC algorithm is effective and reliable to improve generalization performance of dimension reduction and hence the used classifier. CONCLUSION: Dimension reduction by performing redundant gene elimination before feature extraction is better than that with only feature extraction for tumor classification, and redundant gene elimination in a supervised way is superior to the commonly used unsupervised method like linear correlation coefficients.
format Text
id pubmed-2423430
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-24234302008-06-11 Dimension reduction with redundant gene elimination for tumor classification Zeng, Xue-Qiang Li, Guo-Zheng Yang, Jack Y Yang, Mary Qu Wu, Geng-Feng BMC Bioinformatics Research BACKGROUND: Analysis of gene expression data for tumor classification is an important application of bioinformatics methods. But it is hard to analyse gene expression data from DNA microarray experiments by commonly used classifiers, because there are only a few observations but with thousands of measured genes in the data set. Dimension reduction is often used to handle such a high dimensional problem, but it is obscured by the existence of amounts of redundant features in the microarray data set. RESULTS: Dimension reduction is performed by combing feature extraction with redundant gene elimination for tumor classification. A novel metric of redundancy based on DIScriminative Contribution (DISC) is proposed which estimates the feature similarity by explicitly building a linear classifier on each gene. Compared with the standard linear correlation metric, DISC takes the label information into account and directly estimates the redundancy of the discriminative ability of two given features. Based on the DISC metric, a novel algorithm named REDISC (Redundancy Elimination based on Discriminative Contribution) is proposed, which eliminates redundant genes before feature extraction and promotes performance of dimension reduction. Experimental results on two microarray data sets show that the REDISC algorithm is effective and reliable to improve generalization performance of dimension reduction and hence the used classifier. CONCLUSION: Dimension reduction by performing redundant gene elimination before feature extraction is better than that with only feature extraction for tumor classification, and redundant gene elimination in a supervised way is superior to the commonly used unsupervised method like linear correlation coefficients. BioMed Central 2008-05-28 /pmc/articles/PMC2423430/ /pubmed/18541061 http://dx.doi.org/10.1186/1471-2105-9-S6-S8 Text en Copyright © 2008 Zeng et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Zeng, Xue-Qiang
Li, Guo-Zheng
Yang, Jack Y
Yang, Mary Qu
Wu, Geng-Feng
Dimension reduction with redundant gene elimination for tumor classification
title Dimension reduction with redundant gene elimination for tumor classification
title_full Dimension reduction with redundant gene elimination for tumor classification
title_fullStr Dimension reduction with redundant gene elimination for tumor classification
title_full_unstemmed Dimension reduction with redundant gene elimination for tumor classification
title_short Dimension reduction with redundant gene elimination for tumor classification
title_sort dimension reduction with redundant gene elimination for tumor classification
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2423430/
https://www.ncbi.nlm.nih.gov/pubmed/18541061
http://dx.doi.org/10.1186/1471-2105-9-S6-S8
work_keys_str_mv AT zengxueqiang dimensionreductionwithredundantgeneeliminationfortumorclassification
AT liguozheng dimensionreductionwithredundantgeneeliminationfortumorclassification
AT yangjacky dimensionreductionwithredundantgeneeliminationfortumorclassification
AT yangmaryqu dimensionreductionwithredundantgeneeliminationfortumorclassification
AT wugengfeng dimensionreductionwithredundantgeneeliminationfortumorclassification