Cargando…

A Guide for Sparse PCA: Model Comparison and Applications

PCA is a popular tool for exploring and summarizing multivariate data, especially those consisting of many variables. PCA, however, is often not simple to interpret, as the components are a linear combination of the variables. To address this issue, numerous methods have been proposed to sparsify th...

Descripción completa

Detalles Bibliográficos
Autores principales: Guerra-Urzola, Rosember, Van Deun, Katrijn, Vera, Juan C., Sijtsma, Klaas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8636462/
https://www.ncbi.nlm.nih.gov/pubmed/34185214
http://dx.doi.org/10.1007/s11336-021-09773-2
_version_ 1784608532948058112
author Guerra-Urzola, Rosember
Van Deun, Katrijn
Vera, Juan C.
Sijtsma, Klaas
author_facet Guerra-Urzola, Rosember
Van Deun, Katrijn
Vera, Juan C.
Sijtsma, Klaas
author_sort Guerra-Urzola, Rosember
collection PubMed
description PCA is a popular tool for exploring and summarizing multivariate data, especially those consisting of many variables. PCA, however, is often not simple to interpret, as the components are a linear combination of the variables. To address this issue, numerous methods have been proposed to sparsify the nonzero coefficients in the components, including rotation-thresholding methods and, more recently, PCA methods subject to sparsity inducing penalties or constraints. Here, we offer guidelines on how to choose among the different sparse PCA methods. Current literature misses clear guidance on the properties and performance of the different sparse PCA methods, often relying on the misconception that the equivalence of the formulations for ordinary PCA also holds for sparse PCA. To guide potential users of sparse PCA methods, we first discuss several popular sparse PCA methods in terms of where the sparseness is imposed on the loadings or on the weights, assumed model, and optimization criterion used to impose sparseness. Second, using an extensive simulation study, we assess each of these methods by means of performance measures such as squared relative error, misidentification rate, and percentage of explained variance for several data generating models and conditions for the population model. Finally, two examples using empirical data are considered. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s11336-021-09773-2.
format Online
Article
Text
id pubmed-8636462
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-86364622021-12-03 A Guide for Sparse PCA: Model Comparison and Applications Guerra-Urzola, Rosember Van Deun, Katrijn Vera, Juan C. Sijtsma, Klaas Psychometrika Application Reviews and Case Studies PCA is a popular tool for exploring and summarizing multivariate data, especially those consisting of many variables. PCA, however, is often not simple to interpret, as the components are a linear combination of the variables. To address this issue, numerous methods have been proposed to sparsify the nonzero coefficients in the components, including rotation-thresholding methods and, more recently, PCA methods subject to sparsity inducing penalties or constraints. Here, we offer guidelines on how to choose among the different sparse PCA methods. Current literature misses clear guidance on the properties and performance of the different sparse PCA methods, often relying on the misconception that the equivalence of the formulations for ordinary PCA also holds for sparse PCA. To guide potential users of sparse PCA methods, we first discuss several popular sparse PCA methods in terms of where the sparseness is imposed on the loadings or on the weights, assumed model, and optimization criterion used to impose sparseness. Second, using an extensive simulation study, we assess each of these methods by means of performance measures such as squared relative error, misidentification rate, and percentage of explained variance for several data generating models and conditions for the population model. Finally, two examples using empirical data are considered. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s11336-021-09773-2. Springer US 2021-06-29 2021 /pmc/articles/PMC8636462/ /pubmed/34185214 http://dx.doi.org/10.1007/s11336-021-09773-2 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Application Reviews and Case Studies
Guerra-Urzola, Rosember
Van Deun, Katrijn
Vera, Juan C.
Sijtsma, Klaas
A Guide for Sparse PCA: Model Comparison and Applications
title A Guide for Sparse PCA: Model Comparison and Applications
title_full A Guide for Sparse PCA: Model Comparison and Applications
title_fullStr A Guide for Sparse PCA: Model Comparison and Applications
title_full_unstemmed A Guide for Sparse PCA: Model Comparison and Applications
title_short A Guide for Sparse PCA: Model Comparison and Applications
title_sort guide for sparse pca: model comparison and applications
topic Application Reviews and Case Studies
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8636462/
https://www.ncbi.nlm.nih.gov/pubmed/34185214
http://dx.doi.org/10.1007/s11336-021-09773-2
work_keys_str_mv AT guerraurzolarosember aguideforsparsepcamodelcomparisonandapplications
AT vandeunkatrijn aguideforsparsepcamodelcomparisonandapplications
AT verajuanc aguideforsparsepcamodelcomparisonandapplications
AT sijtsmaklaas aguideforsparsepcamodelcomparisonandapplications
AT guerraurzolarosember guideforsparsepcamodelcomparisonandapplications
AT vandeunkatrijn guideforsparsepcamodelcomparisonandapplications
AT verajuanc guideforsparsepcamodelcomparisonandapplications
AT sijtsmaklaas guideforsparsepcamodelcomparisonandapplications