Cargando…
Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data
DNA microarray technology provides a promising approach to the diagnosis and prognosis of tumors on a genome-wide scale by monitoring the expression levels of thousands of genes simultaneously. One problem arising from the use of microarray data is the difficulty to analyze the high-dimensional gene...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2005
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC546133/ https://www.ncbi.nlm.nih.gov/pubmed/15640445 http://dx.doi.org/10.1093/nar/gki144 |
_version_ | 1782122234818592768 |
---|---|
author | Tan, Yongxi Shi, Leming Tong, Weida Wang, Charles |
author_facet | Tan, Yongxi Shi, Leming Tong, Weida Wang, Charles |
author_sort | Tan, Yongxi |
collection | PubMed |
description | DNA microarray technology provides a promising approach to the diagnosis and prognosis of tumors on a genome-wide scale by monitoring the expression levels of thousands of genes simultaneously. One problem arising from the use of microarray data is the difficulty to analyze the high-dimensional gene expression data, typically with thousands of variables (genes) and much fewer observations (samples), in which severe collinearity is often observed. This makes it difficult to apply directly the classical statistical methods to investigate microarray data. In this paper, total principal component regression (TPCR) was proposed to classify human tumors by extracting the latent variable structure underlying microarray data from the augmented subspace of both independent variables and dependent variables. One of the salient features of our method is that it takes into account not only the latent variable structure but also the errors in the microarray gene expression profiles (independent variables). The prediction performance of TPCR was evaluated by both leave-one-out and leave-half-out cross-validation using four well-known microarray datasets. The stabilities and reliabilities of the classification models were further assessed by re-randomization and permutation studies. A fast kernel algorithm was applied to decrease the computation time dramatically. (MATLAB source code is available upon request.) |
format | Text |
id | pubmed-546133 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2005 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-5461332005-02-07 Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data Tan, Yongxi Shi, Leming Tong, Weida Wang, Charles Nucleic Acids Res Article DNA microarray technology provides a promising approach to the diagnosis and prognosis of tumors on a genome-wide scale by monitoring the expression levels of thousands of genes simultaneously. One problem arising from the use of microarray data is the difficulty to analyze the high-dimensional gene expression data, typically with thousands of variables (genes) and much fewer observations (samples), in which severe collinearity is often observed. This makes it difficult to apply directly the classical statistical methods to investigate microarray data. In this paper, total principal component regression (TPCR) was proposed to classify human tumors by extracting the latent variable structure underlying microarray data from the augmented subspace of both independent variables and dependent variables. One of the salient features of our method is that it takes into account not only the latent variable structure but also the errors in the microarray gene expression profiles (independent variables). The prediction performance of TPCR was evaluated by both leave-one-out and leave-half-out cross-validation using four well-known microarray datasets. The stabilities and reliabilities of the classification models were further assessed by re-randomization and permutation studies. A fast kernel algorithm was applied to decrease the computation time dramatically. (MATLAB source code is available upon request.) Oxford University Press 2005 2005-01-07 /pmc/articles/PMC546133/ /pubmed/15640445 http://dx.doi.org/10.1093/nar/gki144 Text en © 2005, the authors Nucleic Acids Research, Vol. 33 No. 1 © Oxford University Press 2005; all rights reserved |
spellingShingle | Article Tan, Yongxi Shi, Leming Tong, Weida Wang, Charles Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data |
title | Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data |
title_full | Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data |
title_fullStr | Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data |
title_full_unstemmed | Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data |
title_short | Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data |
title_sort | multi-class cancer classification by total principal component regression (tpcr) using microarray gene expression data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC546133/ https://www.ncbi.nlm.nih.gov/pubmed/15640445 http://dx.doi.org/10.1093/nar/gki144 |
work_keys_str_mv | AT tanyongxi multiclasscancerclassificationbytotalprincipalcomponentregressiontpcrusingmicroarraygeneexpressiondata AT shileming multiclasscancerclassificationbytotalprincipalcomponentregressiontpcrusingmicroarraygeneexpressiondata AT tongweida multiclasscancerclassificationbytotalprincipalcomponentregressiontpcrusingmicroarraygeneexpressiondata AT wangcharles multiclasscancerclassificationbytotalprincipalcomponentregressiontpcrusingmicroarraygeneexpressiondata |