Cargando…

New bandwidth selection criterion for Kernel PCA: Approach to dimensionality reduction and classification problems

BACKGROUND: DNA microarrays are potentially powerful technology for improving diagnostic classification, treatment selection, and prognostic assessment. The use of this technology to predict cancer outcome has a history of almost a decade. Disease class predictors can be designed for known disease c...

Descripción completa

Detalles Bibliográficos
Autores principales: Thomas, Minta, Brabanter, Kris De, Moor, Bart De
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4025604/
https://www.ncbi.nlm.nih.gov/pubmed/24886083
http://dx.doi.org/10.1186/1471-2105-15-137
_version_ 1782316793309691904
author Thomas, Minta
Brabanter, Kris De
Moor, Bart De
author_facet Thomas, Minta
Brabanter, Kris De
Moor, Bart De
author_sort Thomas, Minta
collection PubMed
description BACKGROUND: DNA microarrays are potentially powerful technology for improving diagnostic classification, treatment selection, and prognostic assessment. The use of this technology to predict cancer outcome has a history of almost a decade. Disease class predictors can be designed for known disease cases and provide diagnostic confirmation or clarify abnormal cases. The main input to this class predictors are high dimensional data with many variables and few observations. Dimensionality reduction of these features set significantly speeds up the prediction task. Feature selection and feature transformation methods are well known preprocessing steps in the field of bioinformatics. Several prediction tools are available based on these techniques. RESULTS: Studies show that a well tuned Kernel PCA (KPCA) is an efficient preprocessing step for dimensionality reduction, but the available bandwidth selection method for KPCA was computationally expensive. In this paper, we propose a new data-driven bandwidth selection criterion for KPCA, which is related to least squares cross-validation for kernel density estimation. We propose a new prediction model with a well tuned KPCA and Least Squares Support Vector Machine (LS-SVM). We estimate the accuracy of the newly proposed model based on 9 case studies. Then, we compare its performances (in terms of test set Area Under the ROC Curve (AUC) and computational time) with other well known techniques such as whole data set + LS-SVM, PCA + LS-SVM, t-test + LS-SVM, Prediction Analysis of Microarrays (PAM) and Least Absolute Shrinkage and Selection Operator (Lasso). Finally, we assess the performance of the proposed strategy with an existing KPCA parameter tuning algorithm by means of two additional case studies. CONCLUSION: We propose, evaluate, and compare several mathematical/statistical techniques, which apply feature transformation/selection for subsequent classification, and consider its application in medical diagnostics. Both feature selection and feature transformation perform well on classification tasks. Due to the dynamic selection property of feature selection, it is hard to define significant features for the classifier, which predicts classes of future samples. Moreover, the proposed strategy enjoys a distinctive advantage with its relatively lesser time complexity.
format Online
Article
Text
id pubmed-4025604
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40256042014-05-30 New bandwidth selection criterion for Kernel PCA: Approach to dimensionality reduction and classification problems Thomas, Minta Brabanter, Kris De Moor, Bart De BMC Bioinformatics Methodology Article BACKGROUND: DNA microarrays are potentially powerful technology for improving diagnostic classification, treatment selection, and prognostic assessment. The use of this technology to predict cancer outcome has a history of almost a decade. Disease class predictors can be designed for known disease cases and provide diagnostic confirmation or clarify abnormal cases. The main input to this class predictors are high dimensional data with many variables and few observations. Dimensionality reduction of these features set significantly speeds up the prediction task. Feature selection and feature transformation methods are well known preprocessing steps in the field of bioinformatics. Several prediction tools are available based on these techniques. RESULTS: Studies show that a well tuned Kernel PCA (KPCA) is an efficient preprocessing step for dimensionality reduction, but the available bandwidth selection method for KPCA was computationally expensive. In this paper, we propose a new data-driven bandwidth selection criterion for KPCA, which is related to least squares cross-validation for kernel density estimation. We propose a new prediction model with a well tuned KPCA and Least Squares Support Vector Machine (LS-SVM). We estimate the accuracy of the newly proposed model based on 9 case studies. Then, we compare its performances (in terms of test set Area Under the ROC Curve (AUC) and computational time) with other well known techniques such as whole data set + LS-SVM, PCA + LS-SVM, t-test + LS-SVM, Prediction Analysis of Microarrays (PAM) and Least Absolute Shrinkage and Selection Operator (Lasso). Finally, we assess the performance of the proposed strategy with an existing KPCA parameter tuning algorithm by means of two additional case studies. CONCLUSION: We propose, evaluate, and compare several mathematical/statistical techniques, which apply feature transformation/selection for subsequent classification, and consider its application in medical diagnostics. Both feature selection and feature transformation perform well on classification tasks. Due to the dynamic selection property of feature selection, it is hard to define significant features for the classifier, which predicts classes of future samples. Moreover, the proposed strategy enjoys a distinctive advantage with its relatively lesser time complexity. BioMed Central 2014-05-10 /pmc/articles/PMC4025604/ /pubmed/24886083 http://dx.doi.org/10.1186/1471-2105-15-137 Text en Copyright © 2014 Thomas et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Methodology Article
Thomas, Minta
Brabanter, Kris De
Moor, Bart De
New bandwidth selection criterion for Kernel PCA: Approach to dimensionality reduction and classification problems
title New bandwidth selection criterion for Kernel PCA: Approach to dimensionality reduction and classification problems
title_full New bandwidth selection criterion for Kernel PCA: Approach to dimensionality reduction and classification problems
title_fullStr New bandwidth selection criterion for Kernel PCA: Approach to dimensionality reduction and classification problems
title_full_unstemmed New bandwidth selection criterion for Kernel PCA: Approach to dimensionality reduction and classification problems
title_short New bandwidth selection criterion for Kernel PCA: Approach to dimensionality reduction and classification problems
title_sort new bandwidth selection criterion for kernel pca: approach to dimensionality reduction and classification problems
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4025604/
https://www.ncbi.nlm.nih.gov/pubmed/24886083
http://dx.doi.org/10.1186/1471-2105-15-137
work_keys_str_mv AT thomasminta newbandwidthselectioncriterionforkernelpcaapproachtodimensionalityreductionandclassificationproblems
AT brabanterkrisde newbandwidthselectioncriterionforkernelpcaapproachtodimensionalityreductionandclassificationproblems
AT moorbartde newbandwidthselectioncriterionforkernelpcaapproachtodimensionalityreductionandclassificationproblems