Cargando…
Kernelized partial least squares for feature reduction and classification of gene microarray data
BACKGROUND: The primary objectives of this paper are: 1.) to apply Statistical Learning Theory (SLT), specifically Partial Least Squares (PLS) and Kernelized PLS (K-PLS), to the universal "feature-rich/case-poor" (also known as "large p small n", or "high-dimension, low-samp...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287568/ https://www.ncbi.nlm.nih.gov/pubmed/22784619 http://dx.doi.org/10.1186/1752-0509-5-S3-S13 |
_version_ | 1782224693093203968 |
---|---|
author | Land, Walker H Qiao, Xingye Margolis, Daniel E Ford, William S Paquette, Christopher T Perez-Rogers, Joseph F Borgia, Jeffrey A Yang, Jack Y Deng, Youping |
author_facet | Land, Walker H Qiao, Xingye Margolis, Daniel E Ford, William S Paquette, Christopher T Perez-Rogers, Joseph F Borgia, Jeffrey A Yang, Jack Y Deng, Youping |
author_sort | Land, Walker H |
collection | PubMed |
description | BACKGROUND: The primary objectives of this paper are: 1.) to apply Statistical Learning Theory (SLT), specifically Partial Least Squares (PLS) and Kernelized PLS (K-PLS), to the universal "feature-rich/case-poor" (also known as "large p small n", or "high-dimension, low-sample size") microarray problem by eliminating those features (or probes) that do not contribute to the "best" chromosome bio-markers for lung cancer, and 2.) quantitatively measure and verify (by an independent means) the efficacy of this PLS process. A secondary objective is to integrate these significant improvements in diagnostic and prognostic biomedical applications into the clinical research arena. That is, to devise a framework for converting SLT results into direct, useful clinical information for patient care or pharmaceutical research. We, therefore, propose and preliminarily evaluate, a process whereby PLS, K-PLS, and Support Vector Machines (SVM) may be integrated with the accepted and well understood traditional biostatistical "gold standard", Cox Proportional Hazard model and Kaplan-Meier survival analysis methods. Specifically, this new combination will be illustrated with both PLS and Kaplan-Meier followed by PLS and Cox Hazard Ratios (CHR) and can be easily extended for both the K-PLS and SVM paradigms. Finally, these previously described processes are contained in the Fine Feature Selection (FFS) component of our overall feature reduction/evaluation process, which consists of the following components: 1.) coarse feature reduction, 2.) fine feature selection and 3.) classification (as described in this paper) and prediction. RESULTS: Our results for PLS and K-PLS showed that these techniques, as part of our overall feature reduction process, performed well on noisy microarray data. The best performance was a good 0.794 Area Under a Receiver Operating Characteristic (ROC) Curve (AUC) for classification of recurrence prior to or after 36 months and a strong 0.869 AUC for classification of recurrence prior to or after 60 months. Kaplan-Meier curves for the classification groups were clearly separated, with p-values below 4.5e-12 for both 36 and 60 months. CHRs were also good, with ratios of 2.846341 (36 months) and 3.996732 (60 months). CONCLUSIONS: SLT techniques such as PLS and K-PLS can effectively address difficult problems with analyzing biomedical data such as microarrays. The combinations with established biostatistical techniques demonstrated in this paper allow these methods to move from academic research and into clinical practice. |
format | Online Article Text |
id | pubmed-3287568 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-32875682012-03-01 Kernelized partial least squares for feature reduction and classification of gene microarray data Land, Walker H Qiao, Xingye Margolis, Daniel E Ford, William S Paquette, Christopher T Perez-Rogers, Joseph F Borgia, Jeffrey A Yang, Jack Y Deng, Youping BMC Syst Biol Research Article BACKGROUND: The primary objectives of this paper are: 1.) to apply Statistical Learning Theory (SLT), specifically Partial Least Squares (PLS) and Kernelized PLS (K-PLS), to the universal "feature-rich/case-poor" (also known as "large p small n", or "high-dimension, low-sample size") microarray problem by eliminating those features (or probes) that do not contribute to the "best" chromosome bio-markers for lung cancer, and 2.) quantitatively measure and verify (by an independent means) the efficacy of this PLS process. A secondary objective is to integrate these significant improvements in diagnostic and prognostic biomedical applications into the clinical research arena. That is, to devise a framework for converting SLT results into direct, useful clinical information for patient care or pharmaceutical research. We, therefore, propose and preliminarily evaluate, a process whereby PLS, K-PLS, and Support Vector Machines (SVM) may be integrated with the accepted and well understood traditional biostatistical "gold standard", Cox Proportional Hazard model and Kaplan-Meier survival analysis methods. Specifically, this new combination will be illustrated with both PLS and Kaplan-Meier followed by PLS and Cox Hazard Ratios (CHR) and can be easily extended for both the K-PLS and SVM paradigms. Finally, these previously described processes are contained in the Fine Feature Selection (FFS) component of our overall feature reduction/evaluation process, which consists of the following components: 1.) coarse feature reduction, 2.) fine feature selection and 3.) classification (as described in this paper) and prediction. RESULTS: Our results for PLS and K-PLS showed that these techniques, as part of our overall feature reduction process, performed well on noisy microarray data. The best performance was a good 0.794 Area Under a Receiver Operating Characteristic (ROC) Curve (AUC) for classification of recurrence prior to or after 36 months and a strong 0.869 AUC for classification of recurrence prior to or after 60 months. Kaplan-Meier curves for the classification groups were clearly separated, with p-values below 4.5e-12 for both 36 and 60 months. CHRs were also good, with ratios of 2.846341 (36 months) and 3.996732 (60 months). CONCLUSIONS: SLT techniques such as PLS and K-PLS can effectively address difficult problems with analyzing biomedical data such as microarrays. The combinations with established biostatistical techniques demonstrated in this paper allow these methods to move from academic research and into clinical practice. BioMed Central 2011-12-23 /pmc/articles/PMC3287568/ /pubmed/22784619 http://dx.doi.org/10.1186/1752-0509-5-S3-S13 Text en Copyright ©2011 Land et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Land, Walker H Qiao, Xingye Margolis, Daniel E Ford, William S Paquette, Christopher T Perez-Rogers, Joseph F Borgia, Jeffrey A Yang, Jack Y Deng, Youping Kernelized partial least squares for feature reduction and classification of gene microarray data |
title | Kernelized partial least squares for feature reduction and classification of gene microarray data |
title_full | Kernelized partial least squares for feature reduction and classification of gene microarray data |
title_fullStr | Kernelized partial least squares for feature reduction and classification of gene microarray data |
title_full_unstemmed | Kernelized partial least squares for feature reduction and classification of gene microarray data |
title_short | Kernelized partial least squares for feature reduction and classification of gene microarray data |
title_sort | kernelized partial least squares for feature reduction and classification of gene microarray data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287568/ https://www.ncbi.nlm.nih.gov/pubmed/22784619 http://dx.doi.org/10.1186/1752-0509-5-S3-S13 |
work_keys_str_mv | AT landwalkerh kernelizedpartialleastsquaresforfeaturereductionandclassificationofgenemicroarraydata AT qiaoxingye kernelizedpartialleastsquaresforfeaturereductionandclassificationofgenemicroarraydata AT margolisdaniele kernelizedpartialleastsquaresforfeaturereductionandclassificationofgenemicroarraydata AT fordwilliams kernelizedpartialleastsquaresforfeaturereductionandclassificationofgenemicroarraydata AT paquettechristophert kernelizedpartialleastsquaresforfeaturereductionandclassificationofgenemicroarraydata AT perezrogersjosephf kernelizedpartialleastsquaresforfeaturereductionandclassificationofgenemicroarraydata AT borgiajeffreya kernelizedpartialleastsquaresforfeaturereductionandclassificationofgenemicroarraydata AT yangjacky kernelizedpartialleastsquaresforfeaturereductionandclassificationofgenemicroarraydata AT dengyouping kernelizedpartialleastsquaresforfeaturereductionandclassificationofgenemicroarraydata |