Cargando…
Subject clustering by IF-PCA and several recent methods
Subject clustering (i.e., the use of measured features to cluster subjects, such as patients or cells, into multiple groups) is a problem of significant interest. In recent years, many approaches have been proposed, among which unsupervised deep learning (UDL) has received much attention. Two intere...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10242062/ https://www.ncbi.nlm.nih.gov/pubmed/37287536 http://dx.doi.org/10.3389/fgene.2023.1166404 |
_version_ | 1785054129414995968 |
---|---|
author | Chen, Dieyi Jin, Jiashun Ke, Zheng Tracy |
author_facet | Chen, Dieyi Jin, Jiashun Ke, Zheng Tracy |
author_sort | Chen, Dieyi |
collection | PubMed |
description | Subject clustering (i.e., the use of measured features to cluster subjects, such as patients or cells, into multiple groups) is a problem of significant interest. In recent years, many approaches have been proposed, among which unsupervised deep learning (UDL) has received much attention. Two interesting questions are 1) how to combine the strengths of UDL and other approaches and 2) how these approaches compare to each other. We combine the variational auto-encoder (VAE), a popular UDL approach, with the recent idea of influential feature-principal component analysis (IF-PCA) and propose IF-VAE as a new method for subject clustering. We study IF-VAE and compare it with several other methods (including IF-PCA, VAE, Seurat, and SC3) on 10 gene microarray data sets and eight single-cell RNA-seq data sets. We find that IF-VAE shows significant improvement over VAE, but still underperforms compared to IF-PCA. We also find that IF-PCA is quite competitive, slightly outperforming Seurat and SC3 over the eight single-cell data sets. IF-PCA is conceptually simple and permits delicate analysis. We demonstrate that IF-PCA is capable of achieving phase transition in a rare/weak model. Comparatively, Seurat and SC3 are more complex and theoretically difficult to analyze (for these reasons, their optimality remains unclear). |
format | Online Article Text |
id | pubmed-10242062 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-102420622023-06-07 Subject clustering by IF-PCA and several recent methods Chen, Dieyi Jin, Jiashun Ke, Zheng Tracy Front Genet Genetics Subject clustering (i.e., the use of measured features to cluster subjects, such as patients or cells, into multiple groups) is a problem of significant interest. In recent years, many approaches have been proposed, among which unsupervised deep learning (UDL) has received much attention. Two interesting questions are 1) how to combine the strengths of UDL and other approaches and 2) how these approaches compare to each other. We combine the variational auto-encoder (VAE), a popular UDL approach, with the recent idea of influential feature-principal component analysis (IF-PCA) and propose IF-VAE as a new method for subject clustering. We study IF-VAE and compare it with several other methods (including IF-PCA, VAE, Seurat, and SC3) on 10 gene microarray data sets and eight single-cell RNA-seq data sets. We find that IF-VAE shows significant improvement over VAE, but still underperforms compared to IF-PCA. We also find that IF-PCA is quite competitive, slightly outperforming Seurat and SC3 over the eight single-cell data sets. IF-PCA is conceptually simple and permits delicate analysis. We demonstrate that IF-PCA is capable of achieving phase transition in a rare/weak model. Comparatively, Seurat and SC3 are more complex and theoretically difficult to analyze (for these reasons, their optimality remains unclear). Frontiers Media S.A. 2023-05-23 /pmc/articles/PMC10242062/ /pubmed/37287536 http://dx.doi.org/10.3389/fgene.2023.1166404 Text en Copyright © 2023 Chen, Jin and Ke. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Chen, Dieyi Jin, Jiashun Ke, Zheng Tracy Subject clustering by IF-PCA and several recent methods |
title | Subject clustering by IF-PCA and several recent methods |
title_full | Subject clustering by IF-PCA and several recent methods |
title_fullStr | Subject clustering by IF-PCA and several recent methods |
title_full_unstemmed | Subject clustering by IF-PCA and several recent methods |
title_short | Subject clustering by IF-PCA and several recent methods |
title_sort | subject clustering by if-pca and several recent methods |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10242062/ https://www.ncbi.nlm.nih.gov/pubmed/37287536 http://dx.doi.org/10.3389/fgene.2023.1166404 |
work_keys_str_mv | AT chendieyi subjectclusteringbyifpcaandseveralrecentmethods AT jinjiashun subjectclusteringbyifpcaandseveralrecentmethods AT kezhengtracy subjectclusteringbyifpcaandseveralrecentmethods |