Cargando…

Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis

Single-cell RNA-seq (scRNA-seq) is a powerful tool for analyzing heterogeneous and functionally diverse cell population. Visualizing scRNA-seq data can help us effectively extract meaningful biological information and identify novel cell subtypes. Currently, the most popular methods for scRNA-seq vi...

Descripción completa

Detalles Bibliográficos
Autor principal: Liu, Zhenqiu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7460854/
https://www.ncbi.nlm.nih.gov/pubmed/32806757
http://dx.doi.org/10.3390/ijms21165797
_version_ 1783576692043808768
author Liu, Zhenqiu
author_facet Liu, Zhenqiu
author_sort Liu, Zhenqiu
collection PubMed
description Single-cell RNA-seq (scRNA-seq) is a powerful tool for analyzing heterogeneous and functionally diverse cell population. Visualizing scRNA-seq data can help us effectively extract meaningful biological information and identify novel cell subtypes. Currently, the most popular methods for scRNA-seq visualization are principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE). While PCA is an unsupervised dimension reduction technique, t-SNE incorporates cluster information into pairwise probability, and then maximizes the Kullback–Leibler divergence. Uniform Manifold Approximation and Projection (UMAP) is another recently developed visualization method similar to t-SNE. However, one limitation with UMAP and t-SNE is that they can only capture the local structure of the data, the global structure of the data is not faithfully preserved. In this manuscript, we propose a semisupervised principal component analysis (ssPCA) approach for scRNA-seq visualization. The proposed approach incorporates cluster-labels into dimension reduction and discovers principal components that maximize both data variance and cluster dependence. ssPCA must have cluster-labels as its input. Therefore, it is most useful for visualizing clusters from a scRNA-seq clustering software. Our experiments with simulation and real scRNA-seq data demonstrate that ssPCA is able to preserve both local and global structures of the data, and uncover the transition and progressions in the data, if they exist. In addition, ssPCA is convex and has a global optimal solution. It is also robust and computationally efficient, making it viable for scRNA-seq cluster visualization.
format Online
Article
Text
id pubmed-7460854
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-74608542020-09-14 Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis Liu, Zhenqiu Int J Mol Sci Article Single-cell RNA-seq (scRNA-seq) is a powerful tool for analyzing heterogeneous and functionally diverse cell population. Visualizing scRNA-seq data can help us effectively extract meaningful biological information and identify novel cell subtypes. Currently, the most popular methods for scRNA-seq visualization are principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE). While PCA is an unsupervised dimension reduction technique, t-SNE incorporates cluster information into pairwise probability, and then maximizes the Kullback–Leibler divergence. Uniform Manifold Approximation and Projection (UMAP) is another recently developed visualization method similar to t-SNE. However, one limitation with UMAP and t-SNE is that they can only capture the local structure of the data, the global structure of the data is not faithfully preserved. In this manuscript, we propose a semisupervised principal component analysis (ssPCA) approach for scRNA-seq visualization. The proposed approach incorporates cluster-labels into dimension reduction and discovers principal components that maximize both data variance and cluster dependence. ssPCA must have cluster-labels as its input. Therefore, it is most useful for visualizing clusters from a scRNA-seq clustering software. Our experiments with simulation and real scRNA-seq data demonstrate that ssPCA is able to preserve both local and global structures of the data, and uncover the transition and progressions in the data, if they exist. In addition, ssPCA is convex and has a global optimal solution. It is also robust and computationally efficient, making it viable for scRNA-seq cluster visualization. MDPI 2020-08-12 /pmc/articles/PMC7460854/ /pubmed/32806757 http://dx.doi.org/10.3390/ijms21165797 Text en © 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Liu, Zhenqiu
Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis
title Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis
title_full Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis
title_fullStr Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis
title_full_unstemmed Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis
title_short Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis
title_sort visualizing single-cell rna-seq data with semisupervised principal component analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7460854/
https://www.ncbi.nlm.nih.gov/pubmed/32806757
http://dx.doi.org/10.3390/ijms21165797
work_keys_str_mv AT liuzhenqiu visualizingsinglecellrnaseqdatawithsemisupervisedprincipalcomponentanalysis