Cargando…

Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data

DNA microarray technology provides a promising approach to the diagnosis and prognosis of tumors on a genome-wide scale by monitoring the expression levels of thousands of genes simultaneously. One problem arising from the use of microarray data is the difficulty to analyze the high-dimensional gene...

Descripción completa

Detalles Bibliográficos
Autores principales: Tan, Yongxi, Shi, Leming, Tong, Weida, Wang, Charles
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC546133/
https://www.ncbi.nlm.nih.gov/pubmed/15640445
http://dx.doi.org/10.1093/nar/gki144
_version_ 1782122234818592768
author Tan, Yongxi
Shi, Leming
Tong, Weida
Wang, Charles
author_facet Tan, Yongxi
Shi, Leming
Tong, Weida
Wang, Charles
author_sort Tan, Yongxi
collection PubMed
description DNA microarray technology provides a promising approach to the diagnosis and prognosis of tumors on a genome-wide scale by monitoring the expression levels of thousands of genes simultaneously. One problem arising from the use of microarray data is the difficulty to analyze the high-dimensional gene expression data, typically with thousands of variables (genes) and much fewer observations (samples), in which severe collinearity is often observed. This makes it difficult to apply directly the classical statistical methods to investigate microarray data. In this paper, total principal component regression (TPCR) was proposed to classify human tumors by extracting the latent variable structure underlying microarray data from the augmented subspace of both independent variables and dependent variables. One of the salient features of our method is that it takes into account not only the latent variable structure but also the errors in the microarray gene expression profiles (independent variables). The prediction performance of TPCR was evaluated by both leave-one-out and leave-half-out cross-validation using four well-known microarray datasets. The stabilities and reliabilities of the classification models were further assessed by re-randomization and permutation studies. A fast kernel algorithm was applied to decrease the computation time dramatically. (MATLAB source code is available upon request.)
format Text
id pubmed-546133
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-5461332005-02-07 Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data Tan, Yongxi Shi, Leming Tong, Weida Wang, Charles Nucleic Acids Res Article DNA microarray technology provides a promising approach to the diagnosis and prognosis of tumors on a genome-wide scale by monitoring the expression levels of thousands of genes simultaneously. One problem arising from the use of microarray data is the difficulty to analyze the high-dimensional gene expression data, typically with thousands of variables (genes) and much fewer observations (samples), in which severe collinearity is often observed. This makes it difficult to apply directly the classical statistical methods to investigate microarray data. In this paper, total principal component regression (TPCR) was proposed to classify human tumors by extracting the latent variable structure underlying microarray data from the augmented subspace of both independent variables and dependent variables. One of the salient features of our method is that it takes into account not only the latent variable structure but also the errors in the microarray gene expression profiles (independent variables). The prediction performance of TPCR was evaluated by both leave-one-out and leave-half-out cross-validation using four well-known microarray datasets. The stabilities and reliabilities of the classification models were further assessed by re-randomization and permutation studies. A fast kernel algorithm was applied to decrease the computation time dramatically. (MATLAB source code is available upon request.) Oxford University Press 2005 2005-01-07 /pmc/articles/PMC546133/ /pubmed/15640445 http://dx.doi.org/10.1093/nar/gki144 Text en © 2005, the authors Nucleic Acids Research, Vol. 33 No. 1 © Oxford University Press 2005; all rights reserved
spellingShingle Article
Tan, Yongxi
Shi, Leming
Tong, Weida
Wang, Charles
Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data
title Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data
title_full Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data
title_fullStr Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data
title_full_unstemmed Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data
title_short Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data
title_sort multi-class cancer classification by total principal component regression (tpcr) using microarray gene expression data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC546133/
https://www.ncbi.nlm.nih.gov/pubmed/15640445
http://dx.doi.org/10.1093/nar/gki144
work_keys_str_mv AT tanyongxi multiclasscancerclassificationbytotalprincipalcomponentregressiontpcrusingmicroarraygeneexpressiondata
AT shileming multiclasscancerclassificationbytotalprincipalcomponentregressiontpcrusingmicroarraygeneexpressiondata
AT tongweida multiclasscancerclassificationbytotalprincipalcomponentregressiontpcrusingmicroarraygeneexpressiondata
AT wangcharles multiclasscancerclassificationbytotalprincipalcomponentregressiontpcrusingmicroarraygeneexpressiondata