Cargando…
Stochastic convex sparse principal component analysis
Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious dis...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5018037/ https://www.ncbi.nlm.nih.gov/pubmed/27660635 http://dx.doi.org/10.1186/s13637-016-0045-x |
_version_ | 1782452849928568832 |
---|---|
author | Baytas, Inci M. Lin, Kaixiang Wang, Fei Jain, Anil K. Zhou, Jiayu |
author_facet | Baytas, Inci M. Lin, Kaixiang Wang, Fei Jain, Anil K. Zhou, Jiayu |
author_sort | Baytas, Inci M. |
collection | PubMed |
description | Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious disadvantage of traditional PCA when it is applied to analyze data where interpretability is important. In applications, where the features have some physical meanings, we lose the ability to interpret the principal components extracted by conventional PCA because each principal component is a linear combination of all the original features. For this reason, sparse PCA has been proposed to improve the interpretability of traditional PCA by introducing sparsity to the loading vectors of principal components. The sparse PCA can be formulated as an ℓ (1) regularized optimization problem, which can be solved by proximal gradient methods. However, these methods do not scale well because computation of the exact gradient is generally required at each iteration. Stochastic gradient framework addresses this challenge by computing an expected gradient at each iteration. Nevertheless, stochastic approaches typically have low convergence rates due to the high variance. In this paper, we propose a convex sparse principal component analysis (Cvx-SPCA), which leverages a proximal variance reduced stochastic scheme to achieve a geometric convergence rate. We further show that the convergence analysis can be significantly simplified by using a weak condition which allows a broader class of objectives to be applied. The efficiency and effectiveness of the proposed method are demonstrated on a large-scale electronic medical record cohort. |
format | Online Article Text |
id | pubmed-5018037 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-50180372016-09-20 Stochastic convex sparse principal component analysis Baytas, Inci M. Lin, Kaixiang Wang, Fei Jain, Anil K. Zhou, Jiayu EURASIP J Bioinform Syst Biol Research Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious disadvantage of traditional PCA when it is applied to analyze data where interpretability is important. In applications, where the features have some physical meanings, we lose the ability to interpret the principal components extracted by conventional PCA because each principal component is a linear combination of all the original features. For this reason, sparse PCA has been proposed to improve the interpretability of traditional PCA by introducing sparsity to the loading vectors of principal components. The sparse PCA can be formulated as an ℓ (1) regularized optimization problem, which can be solved by proximal gradient methods. However, these methods do not scale well because computation of the exact gradient is generally required at each iteration. Stochastic gradient framework addresses this challenge by computing an expected gradient at each iteration. Nevertheless, stochastic approaches typically have low convergence rates due to the high variance. In this paper, we propose a convex sparse principal component analysis (Cvx-SPCA), which leverages a proximal variance reduced stochastic scheme to achieve a geometric convergence rate. We further show that the convergence analysis can be significantly simplified by using a weak condition which allows a broader class of objectives to be applied. The efficiency and effectiveness of the proposed method are demonstrated on a large-scale electronic medical record cohort. Springer International Publishing 2016-09-09 /pmc/articles/PMC5018037/ /pubmed/27660635 http://dx.doi.org/10.1186/s13637-016-0045-x Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. |
spellingShingle | Research Baytas, Inci M. Lin, Kaixiang Wang, Fei Jain, Anil K. Zhou, Jiayu Stochastic convex sparse principal component analysis |
title | Stochastic convex sparse principal component analysis |
title_full | Stochastic convex sparse principal component analysis |
title_fullStr | Stochastic convex sparse principal component analysis |
title_full_unstemmed | Stochastic convex sparse principal component analysis |
title_short | Stochastic convex sparse principal component analysis |
title_sort | stochastic convex sparse principal component analysis |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5018037/ https://www.ncbi.nlm.nih.gov/pubmed/27660635 http://dx.doi.org/10.1186/s13637-016-0045-x |
work_keys_str_mv | AT baytasincim stochasticconvexsparseprincipalcomponentanalysis AT linkaixiang stochasticconvexsparseprincipalcomponentanalysis AT wangfei stochasticconvexsparseprincipalcomponentanalysis AT jainanilk stochasticconvexsparseprincipalcomponentanalysis AT zhoujiayu stochasticconvexsparseprincipalcomponentanalysis |