Cargando…

Stochastic convex sparse principal component analysis

Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious dis...

Descripción completa

Detalles Bibliográficos
Autores principales: Baytas, Inci M., Lin, Kaixiang, Wang, Fei, Jain, Anil K., Zhou, Jiayu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5018037/
https://www.ncbi.nlm.nih.gov/pubmed/27660635
http://dx.doi.org/10.1186/s13637-016-0045-x
_version_ 1782452849928568832
author Baytas, Inci M.
Lin, Kaixiang
Wang, Fei
Jain, Anil K.
Zhou, Jiayu
author_facet Baytas, Inci M.
Lin, Kaixiang
Wang, Fei
Jain, Anil K.
Zhou, Jiayu
author_sort Baytas, Inci M.
collection PubMed
description Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious disadvantage of traditional PCA when it is applied to analyze data where interpretability is important. In applications, where the features have some physical meanings, we lose the ability to interpret the principal components extracted by conventional PCA because each principal component is a linear combination of all the original features. For this reason, sparse PCA has been proposed to improve the interpretability of traditional PCA by introducing sparsity to the loading vectors of principal components. The sparse PCA can be formulated as an ℓ (1) regularized optimization problem, which can be solved by proximal gradient methods. However, these methods do not scale well because computation of the exact gradient is generally required at each iteration. Stochastic gradient framework addresses this challenge by computing an expected gradient at each iteration. Nevertheless, stochastic approaches typically have low convergence rates due to the high variance. In this paper, we propose a convex sparse principal component analysis (Cvx-SPCA), which leverages a proximal variance reduced stochastic scheme to achieve a geometric convergence rate. We further show that the convergence analysis can be significantly simplified by using a weak condition which allows a broader class of objectives to be applied. The efficiency and effectiveness of the proposed method are demonstrated on a large-scale electronic medical record cohort.
format Online
Article
Text
id pubmed-5018037
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-50180372016-09-20 Stochastic convex sparse principal component analysis Baytas, Inci M. Lin, Kaixiang Wang, Fei Jain, Anil K. Zhou, Jiayu EURASIP J Bioinform Syst Biol Research Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious disadvantage of traditional PCA when it is applied to analyze data where interpretability is important. In applications, where the features have some physical meanings, we lose the ability to interpret the principal components extracted by conventional PCA because each principal component is a linear combination of all the original features. For this reason, sparse PCA has been proposed to improve the interpretability of traditional PCA by introducing sparsity to the loading vectors of principal components. The sparse PCA can be formulated as an ℓ (1) regularized optimization problem, which can be solved by proximal gradient methods. However, these methods do not scale well because computation of the exact gradient is generally required at each iteration. Stochastic gradient framework addresses this challenge by computing an expected gradient at each iteration. Nevertheless, stochastic approaches typically have low convergence rates due to the high variance. In this paper, we propose a convex sparse principal component analysis (Cvx-SPCA), which leverages a proximal variance reduced stochastic scheme to achieve a geometric convergence rate. We further show that the convergence analysis can be significantly simplified by using a weak condition which allows a broader class of objectives to be applied. The efficiency and effectiveness of the proposed method are demonstrated on a large-scale electronic medical record cohort. Springer International Publishing 2016-09-09 /pmc/articles/PMC5018037/ /pubmed/27660635 http://dx.doi.org/10.1186/s13637-016-0045-x Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Research
Baytas, Inci M.
Lin, Kaixiang
Wang, Fei
Jain, Anil K.
Zhou, Jiayu
Stochastic convex sparse principal component analysis
title Stochastic convex sparse principal component analysis
title_full Stochastic convex sparse principal component analysis
title_fullStr Stochastic convex sparse principal component analysis
title_full_unstemmed Stochastic convex sparse principal component analysis
title_short Stochastic convex sparse principal component analysis
title_sort stochastic convex sparse principal component analysis
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5018037/
https://www.ncbi.nlm.nih.gov/pubmed/27660635
http://dx.doi.org/10.1186/s13637-016-0045-x
work_keys_str_mv AT baytasincim stochasticconvexsparseprincipalcomponentanalysis
AT linkaixiang stochasticconvexsparseprincipalcomponentanalysis
AT wangfei stochasticconvexsparseprincipalcomponentanalysis
AT jainanilk stochasticconvexsparseprincipalcomponentanalysis
AT zhoujiayu stochasticconvexsparseprincipalcomponentanalysis