Cargando…
Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis
BACKGROUND: Dimension reduction is a critical issue in the analysis of microarray data, because the high dimensionality of gene expression microarray data set hurts generalization performance of classifiers. It consists of two types of methods, i.e. feature selection and feature extraction. Principl...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559889/ https://www.ncbi.nlm.nih.gov/pubmed/18831790 http://dx.doi.org/10.1186/1471-2164-9-S2-S24 |
_version_ | 1782159687357038592 |
---|---|
author | Li, Guo-Zheng Bu, Hua-Long Yang, Mary Qu Zeng, Xue-Qiang Yang, Jack Y |
author_facet | Li, Guo-Zheng Bu, Hua-Long Yang, Mary Qu Zeng, Xue-Qiang Yang, Jack Y |
author_sort | Li, Guo-Zheng |
collection | PubMed |
description | BACKGROUND: Dimension reduction is a critical issue in the analysis of microarray data, because the high dimensionality of gene expression microarray data set hurts generalization performance of classifiers. It consists of two types of methods, i.e. feature selection and feature extraction. Principle component analysis (PCA) and partial least squares (PLS) are two frequently used feature extraction methods, and in the previous works, the top several components of PCA or PLS are selected for modeling according to the descending order of eigenvalues. While in this paper, we prove that not all the top features are useful, but features should be selected from all the components by feature selection methods. RESULTS: We demonstrate a framework for selecting feature subsets from all the newly extracted components, leading to reduced classification error rates on the gene expression microarray data. Here we have considered both an unsupervised method PCA and a supervised method PLS for extracting new components, genetic algorithms for feature selection, and support vector machines and k nearest neighbor for classification. Experimental results illustrate that our proposed framework is effective to select feature subsets and to reduce classification error rates. CONCLUSION: Not only the top features newly extracted by PCA or PLS are important, therefore, feature selection should be performed to select subsets from new features to improve generalization performance of classifiers. |
format | Text |
id | pubmed-2559889 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-25598892008-10-04 Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis Li, Guo-Zheng Bu, Hua-Long Yang, Mary Qu Zeng, Xue-Qiang Yang, Jack Y BMC Genomics Research BACKGROUND: Dimension reduction is a critical issue in the analysis of microarray data, because the high dimensionality of gene expression microarray data set hurts generalization performance of classifiers. It consists of two types of methods, i.e. feature selection and feature extraction. Principle component analysis (PCA) and partial least squares (PLS) are two frequently used feature extraction methods, and in the previous works, the top several components of PCA or PLS are selected for modeling according to the descending order of eigenvalues. While in this paper, we prove that not all the top features are useful, but features should be selected from all the components by feature selection methods. RESULTS: We demonstrate a framework for selecting feature subsets from all the newly extracted components, leading to reduced classification error rates on the gene expression microarray data. Here we have considered both an unsupervised method PCA and a supervised method PLS for extracting new components, genetic algorithms for feature selection, and support vector machines and k nearest neighbor for classification. Experimental results illustrate that our proposed framework is effective to select feature subsets and to reduce classification error rates. CONCLUSION: Not only the top features newly extracted by PCA or PLS are important, therefore, feature selection should be performed to select subsets from new features to improve generalization performance of classifiers. BioMed Central 2008-09-16 /pmc/articles/PMC2559889/ /pubmed/18831790 http://dx.doi.org/10.1186/1471-2164-9-S2-S24 Text en Copyright © 2008 Li et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Li, Guo-Zheng Bu, Hua-Long Yang, Mary Qu Zeng, Xue-Qiang Yang, Jack Y Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis |
title | Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis |
title_full | Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis |
title_fullStr | Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis |
title_full_unstemmed | Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis |
title_short | Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis |
title_sort | selecting subsets of newly extracted features from pca and pls in microarray data analysis |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559889/ https://www.ncbi.nlm.nih.gov/pubmed/18831790 http://dx.doi.org/10.1186/1471-2164-9-S2-S24 |
work_keys_str_mv | AT liguozheng selectingsubsetsofnewlyextractedfeaturesfrompcaandplsinmicroarraydataanalysis AT buhualong selectingsubsetsofnewlyextractedfeaturesfrompcaandplsinmicroarraydataanalysis AT yangmaryqu selectingsubsetsofnewlyextractedfeaturesfrompcaandplsinmicroarraydataanalysis AT zengxueqiang selectingsubsetsofnewlyextractedfeaturesfrompcaandplsinmicroarraydataanalysis AT yangjacky selectingsubsetsofnewlyextractedfeaturesfrompcaandplsinmicroarraydataanalysis |