Cargando…

Supervised capacity preserving mapping: a clustering guided visualization method for scRNA-seq data

MOTIVATION: The rapid development of scRNA-seq technologies enables us to explore the transcriptome at the cell level on a large scale. Recently, various computational methods have been developed to analyze the scRNAseq data, such as clustering and visualization. However, current visualization metho...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhai, Zhiqian, Lei, Yu L, Wang, Rongrong, Xie, Yuying
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9048682/
https://www.ncbi.nlm.nih.gov/pubmed/35253834
http://dx.doi.org/10.1093/bioinformatics/btac131
_version_ 1784695984253566976
author Zhai, Zhiqian
Lei, Yu L
Wang, Rongrong
Xie, Yuying
author_facet Zhai, Zhiqian
Lei, Yu L
Wang, Rongrong
Xie, Yuying
author_sort Zhai, Zhiqian
collection PubMed
description MOTIVATION: The rapid development of scRNA-seq technologies enables us to explore the transcriptome at the cell level on a large scale. Recently, various computational methods have been developed to analyze the scRNAseq data, such as clustering and visualization. However, current visualization methods, including t-SNE and UMAP, are challenged by the limited accuracy of rendering the geometric relationship of populations with distinct functional states. Most visualization methods are unsupervised, leaving out information from the clustering results or given labels. This leads to the inaccurate depiction of the distances between the bona fide functional states. In particular, UMAP and t-SNE are not optimal to preserve the global geometric structure. They may result in a contradiction that clusters with near distance in the embedded dimensions are in fact further away in the original dimensions. Besides, UMAP and t-SNE cannot track the variance of clusters. Through the embedding of t-SNE and UMAP, the variance of a cluster is not only associated with the true variance but also is proportional to the sample size. RESULTS: We present supCPM, a robust supervised visualization method, which separates different clusters, preserves the global structure and tracks the cluster variance. Compared with six visualization methods using synthetic and real datasets, supCPM shows improved performance than other methods in preserving the global geometric structure and data variance. Overall, supCPM provides an enhanced visualization pipeline to assist the interpretation of functional transition and accurately depict population segregation. AVAILABILITY AND IMPLEMENTATION: The R package and source code are available at https://zenodo.org/record/5975977#.YgqR1PXMJjM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9048682
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-90486822022-04-29 Supervised capacity preserving mapping: a clustering guided visualization method for scRNA-seq data Zhai, Zhiqian Lei, Yu L Wang, Rongrong Xie, Yuying Bioinformatics Original Papers MOTIVATION: The rapid development of scRNA-seq technologies enables us to explore the transcriptome at the cell level on a large scale. Recently, various computational methods have been developed to analyze the scRNAseq data, such as clustering and visualization. However, current visualization methods, including t-SNE and UMAP, are challenged by the limited accuracy of rendering the geometric relationship of populations with distinct functional states. Most visualization methods are unsupervised, leaving out information from the clustering results or given labels. This leads to the inaccurate depiction of the distances between the bona fide functional states. In particular, UMAP and t-SNE are not optimal to preserve the global geometric structure. They may result in a contradiction that clusters with near distance in the embedded dimensions are in fact further away in the original dimensions. Besides, UMAP and t-SNE cannot track the variance of clusters. Through the embedding of t-SNE and UMAP, the variance of a cluster is not only associated with the true variance but also is proportional to the sample size. RESULTS: We present supCPM, a robust supervised visualization method, which separates different clusters, preserves the global structure and tracks the cluster variance. Compared with six visualization methods using synthetic and real datasets, supCPM shows improved performance than other methods in preserving the global geometric structure and data variance. Overall, supCPM provides an enhanced visualization pipeline to assist the interpretation of functional transition and accurately depict population segregation. AVAILABILITY AND IMPLEMENTATION: The R package and source code are available at https://zenodo.org/record/5975977#.YgqR1PXMJjM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-03-07 /pmc/articles/PMC9048682/ /pubmed/35253834 http://dx.doi.org/10.1093/bioinformatics/btac131 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Zhai, Zhiqian
Lei, Yu L
Wang, Rongrong
Xie, Yuying
Supervised capacity preserving mapping: a clustering guided visualization method for scRNA-seq data
title Supervised capacity preserving mapping: a clustering guided visualization method for scRNA-seq data
title_full Supervised capacity preserving mapping: a clustering guided visualization method for scRNA-seq data
title_fullStr Supervised capacity preserving mapping: a clustering guided visualization method for scRNA-seq data
title_full_unstemmed Supervised capacity preserving mapping: a clustering guided visualization method for scRNA-seq data
title_short Supervised capacity preserving mapping: a clustering guided visualization method for scRNA-seq data
title_sort supervised capacity preserving mapping: a clustering guided visualization method for scrna-seq data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9048682/
https://www.ncbi.nlm.nih.gov/pubmed/35253834
http://dx.doi.org/10.1093/bioinformatics/btac131
work_keys_str_mv AT zhaizhiqian supervisedcapacitypreservingmappingaclusteringguidedvisualizationmethodforscrnaseqdata
AT leiyul supervisedcapacitypreservingmappingaclusteringguidedvisualizationmethodforscrnaseqdata
AT wangrongrong supervisedcapacitypreservingmappingaclusteringguidedvisualizationmethodforscrnaseqdata
AT xieyuying supervisedcapacitypreservingmappingaclusteringguidedvisualizationmethodforscrnaseqdata