Cargando…

Stochastic convex sparse principal component analysis

Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious dis...

Descripción completa

Detalles Bibliográficos
Autores principales: Baytas, Inci M., Lin, Kaixiang, Wang, Fei, Jain, Anil K., Zhou, Jiayu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5018037/
https://www.ncbi.nlm.nih.gov/pubmed/27660635
http://dx.doi.org/10.1186/s13637-016-0045-x
Descripción
Sumario:Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious disadvantage of traditional PCA when it is applied to analyze data where interpretability is important. In applications, where the features have some physical meanings, we lose the ability to interpret the principal components extracted by conventional PCA because each principal component is a linear combination of all the original features. For this reason, sparse PCA has been proposed to improve the interpretability of traditional PCA by introducing sparsity to the loading vectors of principal components. The sparse PCA can be formulated as an ℓ (1) regularized optimization problem, which can be solved by proximal gradient methods. However, these methods do not scale well because computation of the exact gradient is generally required at each iteration. Stochastic gradient framework addresses this challenge by computing an expected gradient at each iteration. Nevertheless, stochastic approaches typically have low convergence rates due to the high variance. In this paper, we propose a convex sparse principal component analysis (Cvx-SPCA), which leverages a proximal variance reduced stochastic scheme to achieve a geometric convergence rate. We further show that the convergence analysis can be significantly simplified by using a weak condition which allows a broader class of objectives to be applied. The efficiency and effectiveness of the proposed method are demonstrated on a large-scale electronic medical record cohort.