Cargando…

Subject clustering by IF-PCA and several recent methods

Subject clustering (i.e., the use of measured features to cluster subjects, such as patients or cells, into multiple groups) is a problem of significant interest. In recent years, many approaches have been proposed, among which unsupervised deep learning (UDL) has received much attention. Two intere...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Dieyi, Jin, Jiashun, Ke, Zheng Tracy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10242062/
https://www.ncbi.nlm.nih.gov/pubmed/37287536
http://dx.doi.org/10.3389/fgene.2023.1166404
_version_ 1785054129414995968
author Chen, Dieyi
Jin, Jiashun
Ke, Zheng Tracy
author_facet Chen, Dieyi
Jin, Jiashun
Ke, Zheng Tracy
author_sort Chen, Dieyi
collection PubMed
description Subject clustering (i.e., the use of measured features to cluster subjects, such as patients or cells, into multiple groups) is a problem of significant interest. In recent years, many approaches have been proposed, among which unsupervised deep learning (UDL) has received much attention. Two interesting questions are 1) how to combine the strengths of UDL and other approaches and 2) how these approaches compare to each other. We combine the variational auto-encoder (VAE), a popular UDL approach, with the recent idea of influential feature-principal component analysis (IF-PCA) and propose IF-VAE as a new method for subject clustering. We study IF-VAE and compare it with several other methods (including IF-PCA, VAE, Seurat, and SC3) on 10 gene microarray data sets and eight single-cell RNA-seq data sets. We find that IF-VAE shows significant improvement over VAE, but still underperforms compared to IF-PCA. We also find that IF-PCA is quite competitive, slightly outperforming Seurat and SC3 over the eight single-cell data sets. IF-PCA is conceptually simple and permits delicate analysis. We demonstrate that IF-PCA is capable of achieving phase transition in a rare/weak model. Comparatively, Seurat and SC3 are more complex and theoretically difficult to analyze (for these reasons, their optimality remains unclear).
format Online
Article
Text
id pubmed-10242062
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-102420622023-06-07 Subject clustering by IF-PCA and several recent methods Chen, Dieyi Jin, Jiashun Ke, Zheng Tracy Front Genet Genetics Subject clustering (i.e., the use of measured features to cluster subjects, such as patients or cells, into multiple groups) is a problem of significant interest. In recent years, many approaches have been proposed, among which unsupervised deep learning (UDL) has received much attention. Two interesting questions are 1) how to combine the strengths of UDL and other approaches and 2) how these approaches compare to each other. We combine the variational auto-encoder (VAE), a popular UDL approach, with the recent idea of influential feature-principal component analysis (IF-PCA) and propose IF-VAE as a new method for subject clustering. We study IF-VAE and compare it with several other methods (including IF-PCA, VAE, Seurat, and SC3) on 10 gene microarray data sets and eight single-cell RNA-seq data sets. We find that IF-VAE shows significant improvement over VAE, but still underperforms compared to IF-PCA. We also find that IF-PCA is quite competitive, slightly outperforming Seurat and SC3 over the eight single-cell data sets. IF-PCA is conceptually simple and permits delicate analysis. We demonstrate that IF-PCA is capable of achieving phase transition in a rare/weak model. Comparatively, Seurat and SC3 are more complex and theoretically difficult to analyze (for these reasons, their optimality remains unclear). Frontiers Media S.A. 2023-05-23 /pmc/articles/PMC10242062/ /pubmed/37287536 http://dx.doi.org/10.3389/fgene.2023.1166404 Text en Copyright © 2023 Chen, Jin and Ke. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Chen, Dieyi
Jin, Jiashun
Ke, Zheng Tracy
Subject clustering by IF-PCA and several recent methods
title Subject clustering by IF-PCA and several recent methods
title_full Subject clustering by IF-PCA and several recent methods
title_fullStr Subject clustering by IF-PCA and several recent methods
title_full_unstemmed Subject clustering by IF-PCA and several recent methods
title_short Subject clustering by IF-PCA and several recent methods
title_sort subject clustering by if-pca and several recent methods
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10242062/
https://www.ncbi.nlm.nih.gov/pubmed/37287536
http://dx.doi.org/10.3389/fgene.2023.1166404
work_keys_str_mv AT chendieyi subjectclusteringbyifpcaandseveralrecentmethods
AT jinjiashun subjectclusteringbyifpcaandseveralrecentmethods
AT kezhengtracy subjectclusteringbyifpcaandseveralrecentmethods