Cargando…
Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis
Single-cell RNA-seq (scRNA-seq) is a powerful tool for analyzing heterogeneous and functionally diverse cell population. Visualizing scRNA-seq data can help us effectively extract meaningful biological information and identify novel cell subtypes. Currently, the most popular methods for scRNA-seq vi...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7460854/ https://www.ncbi.nlm.nih.gov/pubmed/32806757 http://dx.doi.org/10.3390/ijms21165797 |
_version_ | 1783576692043808768 |
---|---|
author | Liu, Zhenqiu |
author_facet | Liu, Zhenqiu |
author_sort | Liu, Zhenqiu |
collection | PubMed |
description | Single-cell RNA-seq (scRNA-seq) is a powerful tool for analyzing heterogeneous and functionally diverse cell population. Visualizing scRNA-seq data can help us effectively extract meaningful biological information and identify novel cell subtypes. Currently, the most popular methods for scRNA-seq visualization are principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE). While PCA is an unsupervised dimension reduction technique, t-SNE incorporates cluster information into pairwise probability, and then maximizes the Kullback–Leibler divergence. Uniform Manifold Approximation and Projection (UMAP) is another recently developed visualization method similar to t-SNE. However, one limitation with UMAP and t-SNE is that they can only capture the local structure of the data, the global structure of the data is not faithfully preserved. In this manuscript, we propose a semisupervised principal component analysis (ssPCA) approach for scRNA-seq visualization. The proposed approach incorporates cluster-labels into dimension reduction and discovers principal components that maximize both data variance and cluster dependence. ssPCA must have cluster-labels as its input. Therefore, it is most useful for visualizing clusters from a scRNA-seq clustering software. Our experiments with simulation and real scRNA-seq data demonstrate that ssPCA is able to preserve both local and global structures of the data, and uncover the transition and progressions in the data, if they exist. In addition, ssPCA is convex and has a global optimal solution. It is also robust and computationally efficient, making it viable for scRNA-seq cluster visualization. |
format | Online Article Text |
id | pubmed-7460854 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-74608542020-09-14 Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis Liu, Zhenqiu Int J Mol Sci Article Single-cell RNA-seq (scRNA-seq) is a powerful tool for analyzing heterogeneous and functionally diverse cell population. Visualizing scRNA-seq data can help us effectively extract meaningful biological information and identify novel cell subtypes. Currently, the most popular methods for scRNA-seq visualization are principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE). While PCA is an unsupervised dimension reduction technique, t-SNE incorporates cluster information into pairwise probability, and then maximizes the Kullback–Leibler divergence. Uniform Manifold Approximation and Projection (UMAP) is another recently developed visualization method similar to t-SNE. However, one limitation with UMAP and t-SNE is that they can only capture the local structure of the data, the global structure of the data is not faithfully preserved. In this manuscript, we propose a semisupervised principal component analysis (ssPCA) approach for scRNA-seq visualization. The proposed approach incorporates cluster-labels into dimension reduction and discovers principal components that maximize both data variance and cluster dependence. ssPCA must have cluster-labels as its input. Therefore, it is most useful for visualizing clusters from a scRNA-seq clustering software. Our experiments with simulation and real scRNA-seq data demonstrate that ssPCA is able to preserve both local and global structures of the data, and uncover the transition and progressions in the data, if they exist. In addition, ssPCA is convex and has a global optimal solution. It is also robust and computationally efficient, making it viable for scRNA-seq cluster visualization. MDPI 2020-08-12 /pmc/articles/PMC7460854/ /pubmed/32806757 http://dx.doi.org/10.3390/ijms21165797 Text en © 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Liu, Zhenqiu Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis |
title | Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis |
title_full | Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis |
title_fullStr | Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis |
title_full_unstemmed | Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis |
title_short | Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis |
title_sort | visualizing single-cell rna-seq data with semisupervised principal component analysis |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7460854/ https://www.ncbi.nlm.nih.gov/pubmed/32806757 http://dx.doi.org/10.3390/ijms21165797 |
work_keys_str_mv | AT liuzhenqiu visualizingsinglecellrnaseqdatawithsemisupervisedprincipalcomponentanalysis |