Cargando…

Shape-aware stochastic neighbor embedding for robust data visualisations

BACKGROUND: The t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm has emerged as one of the leading methods for visualising high-dimensional (HD) data in a wide variety of fields, especially for revealing cluster structure in HD single-cell transcriptomics data. However, t-SNE often fail...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wängberg, Tobias, Tyrcha, Joanna, Li, Chun-Biu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2022
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9660178/ https://www.ncbi.nlm.nih.gov/pubmed/36376789 http://dx.doi.org/10.1186/s12859-022-05028-8

_version_	1784830366159208448
author	Wängberg, Tobias Tyrcha, Joanna Li, Chun-Biu
author_facet	Wängberg, Tobias Tyrcha, Joanna Li, Chun-Biu
author_sort	Wängberg, Tobias
collection	PubMed
description	BACKGROUND: The t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm has emerged as one of the leading methods for visualising high-dimensional (HD) data in a wide variety of fields, especially for revealing cluster structure in HD single-cell transcriptomics data. However, t-SNE often fails to correctly represent hierarchical relationships between clusters and creates spurious patterns in the embedding. In this work we generalised t-SNE using shape-aware graph distances to mitigate some of the limitations of the t-SNE. Although many methods have been recently proposed to circumvent the shortcomings of t-SNE, notably Uniform manifold approximation (UMAP) and Potential of heat diffusion for affinity-based transition embedding (PHATE), we see a clear advantage of the proposed graph-based method. RESULTS: The superior performance of the proposed method is first demonstrated on simulated data, where a significant improvement compared to t-SNE, UMAP and PHATE, based on quantitative validation indices, is observed when visualising imbalanced, nonlinear, continuous and hierarchically structured data. Thereafter the ability of the proposed method compared to the competing methods to create faithfully low-dimensional embeddings is shown on two real-world data sets, the single-cell transcriptomics data and the MNIST image data. In addition, the only hyper-parameter of the method can be automatically chosen in a data-driven way, which is consistently optimal across all test cases in this study. CONCLUSIONS: In this work we show that the proposed shape-aware stochastic neighbor embedding method creates low-dimensional visualisations that robustly and accurately reveal key structures of high-dimensional data.
format	Online Article Text
id	pubmed-9660178
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-96601782022-11-14 Shape-aware stochastic neighbor embedding for robust data visualisations Wängberg, Tobias Tyrcha, Joanna Li, Chun-Biu BMC Bioinformatics Research BACKGROUND: The t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm has emerged as one of the leading methods for visualising high-dimensional (HD) data in a wide variety of fields, especially for revealing cluster structure in HD single-cell transcriptomics data. However, t-SNE often fails to correctly represent hierarchical relationships between clusters and creates spurious patterns in the embedding. In this work we generalised t-SNE using shape-aware graph distances to mitigate some of the limitations of the t-SNE. Although many methods have been recently proposed to circumvent the shortcomings of t-SNE, notably Uniform manifold approximation (UMAP) and Potential of heat diffusion for affinity-based transition embedding (PHATE), we see a clear advantage of the proposed graph-based method. RESULTS: The superior performance of the proposed method is first demonstrated on simulated data, where a significant improvement compared to t-SNE, UMAP and PHATE, based on quantitative validation indices, is observed when visualising imbalanced, nonlinear, continuous and hierarchically structured data. Thereafter the ability of the proposed method compared to the competing methods to create faithfully low-dimensional embeddings is shown on two real-world data sets, the single-cell transcriptomics data and the MNIST image data. In addition, the only hyper-parameter of the method can be automatically chosen in a data-driven way, which is consistently optimal across all test cases in this study. CONCLUSIONS: In this work we show that the proposed shape-aware stochastic neighbor embedding method creates low-dimensional visualisations that robustly and accurately reveal key structures of high-dimensional data. BioMed Central 2022-11-14 /pmc/articles/PMC9660178/ /pubmed/36376789 http://dx.doi.org/10.1186/s12859-022-05028-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Wängberg, Tobias Tyrcha, Joanna Li, Chun-Biu Shape-aware stochastic neighbor embedding for robust data visualisations
title	Shape-aware stochastic neighbor embedding for robust data visualisations
title_full	Shape-aware stochastic neighbor embedding for robust data visualisations
title_fullStr	Shape-aware stochastic neighbor embedding for robust data visualisations
title_full_unstemmed	Shape-aware stochastic neighbor embedding for robust data visualisations
title_short	Shape-aware stochastic neighbor embedding for robust data visualisations
title_sort	shape-aware stochastic neighbor embedding for robust data visualisations
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9660178/ https://www.ncbi.nlm.nih.gov/pubmed/36376789 http://dx.doi.org/10.1186/s12859-022-05028-8
work_keys_str_mv	AT wangbergtobias shapeawarestochasticneighborembeddingforrobustdatavisualisations AT tyrchajoanna shapeawarestochasticneighborembeddingforrobustdatavisualisations AT lichunbiu shapeawarestochasticneighborembeddingforrobustdatavisualisations

Shape-aware stochastic neighbor embedding for robust data visualisations

Ejemplares similares