Cargando…

Stochastic convex sparse principal component analysis

Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious dis...

Descripción completa

Detalles Bibliográficos
Autores principales:	Baytas, Inci M., Lin, Kaixiang, Wang, Fei, Jain, Anil K., Zhou, Jiayu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2016
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5018037/ https://www.ncbi.nlm.nih.gov/pubmed/27660635 http://dx.doi.org/10.1186/s13637-016-0045-x

_version_	1782452849928568832
author	Baytas, Inci M. Lin, Kaixiang Wang, Fei Jain, Anil K. Zhou, Jiayu
author_facet	Baytas, Inci M. Lin, Kaixiang Wang, Fei Jain, Anil K. Zhou, Jiayu
author_sort	Baytas, Inci M.
collection	PubMed
description	Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious disadvantage of traditional PCA when it is applied to analyze data where interpretability is important. In applications, where the features have some physical meanings, we lose the ability to interpret the principal components extracted by conventional PCA because each principal component is a linear combination of all the original features. For this reason, sparse PCA has been proposed to improve the interpretability of traditional PCA by introducing sparsity to the loading vectors of principal components. The sparse PCA can be formulated as an ℓ (1) regularized optimization problem, which can be solved by proximal gradient methods. However, these methods do not scale well because computation of the exact gradient is generally required at each iteration. Stochastic gradient framework addresses this challenge by computing an expected gradient at each iteration. Nevertheless, stochastic approaches typically have low convergence rates due to the high variance. In this paper, we propose a convex sparse principal component analysis (Cvx-SPCA), which leverages a proximal variance reduced stochastic scheme to achieve a geometric convergence rate. We further show that the convergence analysis can be significantly simplified by using a weak condition which allows a broader class of objectives to be applied. The efficiency and effectiveness of the proposed method are demonstrated on a large-scale electronic medical record cohort.
format	Online Article Text
id	pubmed-5018037
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-50180372016-09-20 Stochastic convex sparse principal component analysis Baytas, Inci M. Lin, Kaixiang Wang, Fei Jain, Anil K. Zhou, Jiayu EURASIP J Bioinform Syst Biol Research Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious disadvantage of traditional PCA when it is applied to analyze data where interpretability is important. In applications, where the features have some physical meanings, we lose the ability to interpret the principal components extracted by conventional PCA because each principal component is a linear combination of all the original features. For this reason, sparse PCA has been proposed to improve the interpretability of traditional PCA by introducing sparsity to the loading vectors of principal components. The sparse PCA can be formulated as an ℓ (1) regularized optimization problem, which can be solved by proximal gradient methods. However, these methods do not scale well because computation of the exact gradient is generally required at each iteration. Stochastic gradient framework addresses this challenge by computing an expected gradient at each iteration. Nevertheless, stochastic approaches typically have low convergence rates due to the high variance. In this paper, we propose a convex sparse principal component analysis (Cvx-SPCA), which leverages a proximal variance reduced stochastic scheme to achieve a geometric convergence rate. We further show that the convergence analysis can be significantly simplified by using a weak condition which allows a broader class of objectives to be applied. The efficiency and effectiveness of the proposed method are demonstrated on a large-scale electronic medical record cohort. Springer International Publishing 2016-09-09 /pmc/articles/PMC5018037/ /pubmed/27660635 http://dx.doi.org/10.1186/s13637-016-0045-x Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle	Research Baytas, Inci M. Lin, Kaixiang Wang, Fei Jain, Anil K. Zhou, Jiayu Stochastic convex sparse principal component analysis
title	Stochastic convex sparse principal component analysis
title_full	Stochastic convex sparse principal component analysis
title_fullStr	Stochastic convex sparse principal component analysis
title_full_unstemmed	Stochastic convex sparse principal component analysis
title_short	Stochastic convex sparse principal component analysis
title_sort	stochastic convex sparse principal component analysis
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5018037/ https://www.ncbi.nlm.nih.gov/pubmed/27660635 http://dx.doi.org/10.1186/s13637-016-0045-x
work_keys_str_mv	AT baytasincim stochasticconvexsparseprincipalcomponentanalysis AT linkaixiang stochasticconvexsparseprincipalcomponentanalysis AT wangfei stochasticconvexsparseprincipalcomponentanalysis AT jainanilk stochasticconvexsparseprincipalcomponentanalysis AT zhoujiayu stochasticconvexsparseprincipalcomponentanalysis

Stochastic convex sparse principal component analysis

Ejemplares similares