Cargando…

SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis

Latent factor models, like principal component analysis (PCA), provide a statistical framework to infer low-rank representation in various biological contexts. However, feature selection is challenging when this low-rank structure manifests from a sparse subspace. We introduce SuSiE PCA, a scalable...

Descripción completa

Detalles Bibliográficos
Autores principales: Yuan, Dong, Mancuso, Nicholas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10638022/
https://www.ncbi.nlm.nih.gov/pubmed/37953948
http://dx.doi.org/10.1016/j.isci.2023.108181
_version_ 1785133525513535488
author Yuan, Dong
Mancuso, Nicholas
author_facet Yuan, Dong
Mancuso, Nicholas
author_sort Yuan, Dong
collection PubMed
description Latent factor models, like principal component analysis (PCA), provide a statistical framework to infer low-rank representation in various biological contexts. However, feature selection is challenging when this low-rank structure manifests from a sparse subspace. We introduce SuSiE PCA, a scalable sparse latent factor approach that evaluates uncertainty in contributing variables through posterior inclusion probabilities. We validate our model in extensive simulations and demonstrate that SuSiE PCA outperforms other approaches in signal detection and model robustness. We apply SuSiE PCA to multi-tissue expression quantitative trait loci (eQTLs) data from GTEx v8 and identify tissue-specific factors and their contributing eGenes. We further investigate its performance on the large-scale perturbation data and find that SuSiE PCA identifies modules with a higher enrichment of ribosome-related genes than sparse PCA (false discovery rate [FDR] [Formula: see text] vs. [Formula: see text]), while being [Formula: see text] 18x faster. Overall, SuSiE PCA provides an efficient tool to identify relevant features in high-dimensional biological data.
format Online
Article
Text
id pubmed-10638022
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-106380222023-11-11 SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis Yuan, Dong Mancuso, Nicholas iScience Article Latent factor models, like principal component analysis (PCA), provide a statistical framework to infer low-rank representation in various biological contexts. However, feature selection is challenging when this low-rank structure manifests from a sparse subspace. We introduce SuSiE PCA, a scalable sparse latent factor approach that evaluates uncertainty in contributing variables through posterior inclusion probabilities. We validate our model in extensive simulations and demonstrate that SuSiE PCA outperforms other approaches in signal detection and model robustness. We apply SuSiE PCA to multi-tissue expression quantitative trait loci (eQTLs) data from GTEx v8 and identify tissue-specific factors and their contributing eGenes. We further investigate its performance on the large-scale perturbation data and find that SuSiE PCA identifies modules with a higher enrichment of ribosome-related genes than sparse PCA (false discovery rate [FDR] [Formula: see text] vs. [Formula: see text]), while being [Formula: see text] 18x faster. Overall, SuSiE PCA provides an efficient tool to identify relevant features in high-dimensional biological data. Elsevier 2023-10-13 /pmc/articles/PMC10638022/ /pubmed/37953948 http://dx.doi.org/10.1016/j.isci.2023.108181 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Yuan, Dong
Mancuso, Nicholas
SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis
title SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis
title_full SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis
title_fullStr SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis
title_full_unstemmed SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis
title_short SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis
title_sort susie pca: a scalable bayesian variable selection technique for principal component analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10638022/
https://www.ncbi.nlm.nih.gov/pubmed/37953948
http://dx.doi.org/10.1016/j.isci.2023.108181
work_keys_str_mv AT yuandong susiepcaascalablebayesianvariableselectiontechniqueforprincipalcomponentanalysis
AT mancusonicholas susiepcaascalablebayesianvariableselectiontechniqueforprincipalcomponentanalysis