Cargando…

Shape-aware stochastic neighbor embedding for robust data visualisations

BACKGROUND: The t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm has emerged as one of the leading methods for visualising high-dimensional (HD) data in a wide variety of fields, especially for revealing cluster structure in HD single-cell transcriptomics data. However, t-SNE often fail...

Descripción completa

Detalles Bibliográficos
Autores principales: Wängberg, Tobias, Tyrcha, Joanna, Li, Chun-Biu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9660178/
https://www.ncbi.nlm.nih.gov/pubmed/36376789
http://dx.doi.org/10.1186/s12859-022-05028-8
_version_ 1784830366159208448
author Wängberg, Tobias
Tyrcha, Joanna
Li, Chun-Biu
author_facet Wängberg, Tobias
Tyrcha, Joanna
Li, Chun-Biu
author_sort Wängberg, Tobias
collection PubMed
description BACKGROUND: The t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm has emerged as one of the leading methods for visualising high-dimensional (HD) data in a wide variety of fields, especially for revealing cluster structure in HD single-cell transcriptomics data. However, t-SNE often fails to correctly represent hierarchical relationships between clusters and creates spurious patterns in the embedding. In this work we generalised t-SNE using shape-aware graph distances to mitigate some of the limitations of the t-SNE. Although many methods have been recently proposed to circumvent the shortcomings of t-SNE, notably Uniform manifold approximation (UMAP) and Potential of heat diffusion for affinity-based transition embedding (PHATE), we see a clear advantage of the proposed graph-based method. RESULTS: The superior performance of the proposed method is first demonstrated on simulated data, where a significant improvement compared to t-SNE, UMAP and PHATE, based on quantitative validation indices, is observed when visualising imbalanced, nonlinear, continuous and hierarchically structured data. Thereafter the ability of the proposed method compared to the competing methods to create faithfully low-dimensional embeddings is shown on two real-world data sets, the single-cell transcriptomics data and the MNIST image data. In addition, the only hyper-parameter of the method can be automatically chosen in a data-driven way, which is consistently optimal across all test cases in this study. CONCLUSIONS: In this work we show that the proposed shape-aware stochastic neighbor embedding method creates low-dimensional visualisations that robustly and accurately reveal key structures of high-dimensional data.
format Online
Article
Text
id pubmed-9660178
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-96601782022-11-14 Shape-aware stochastic neighbor embedding for robust data visualisations Wängberg, Tobias Tyrcha, Joanna Li, Chun-Biu BMC Bioinformatics Research BACKGROUND: The t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm has emerged as one of the leading methods for visualising high-dimensional (HD) data in a wide variety of fields, especially for revealing cluster structure in HD single-cell transcriptomics data. However, t-SNE often fails to correctly represent hierarchical relationships between clusters and creates spurious patterns in the embedding. In this work we generalised t-SNE using shape-aware graph distances to mitigate some of the limitations of the t-SNE. Although many methods have been recently proposed to circumvent the shortcomings of t-SNE, notably Uniform manifold approximation (UMAP) and Potential of heat diffusion for affinity-based transition embedding (PHATE), we see a clear advantage of the proposed graph-based method. RESULTS: The superior performance of the proposed method is first demonstrated on simulated data, where a significant improvement compared to t-SNE, UMAP and PHATE, based on quantitative validation indices, is observed when visualising imbalanced, nonlinear, continuous and hierarchically structured data. Thereafter the ability of the proposed method compared to the competing methods to create faithfully low-dimensional embeddings is shown on two real-world data sets, the single-cell transcriptomics data and the MNIST image data. In addition, the only hyper-parameter of the method can be automatically chosen in a data-driven way, which is consistently optimal across all test cases in this study. CONCLUSIONS: In this work we show that the proposed shape-aware stochastic neighbor embedding method creates low-dimensional visualisations that robustly and accurately reveal key structures of high-dimensional data. BioMed Central 2022-11-14 /pmc/articles/PMC9660178/ /pubmed/36376789 http://dx.doi.org/10.1186/s12859-022-05028-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Wängberg, Tobias
Tyrcha, Joanna
Li, Chun-Biu
Shape-aware stochastic neighbor embedding for robust data visualisations
title Shape-aware stochastic neighbor embedding for robust data visualisations
title_full Shape-aware stochastic neighbor embedding for robust data visualisations
title_fullStr Shape-aware stochastic neighbor embedding for robust data visualisations
title_full_unstemmed Shape-aware stochastic neighbor embedding for robust data visualisations
title_short Shape-aware stochastic neighbor embedding for robust data visualisations
title_sort shape-aware stochastic neighbor embedding for robust data visualisations
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9660178/
https://www.ncbi.nlm.nih.gov/pubmed/36376789
http://dx.doi.org/10.1186/s12859-022-05028-8
work_keys_str_mv AT wangbergtobias shapeawarestochasticneighborembeddingforrobustdatavisualisations
AT tyrchajoanna shapeawarestochasticneighborembeddingforrobustdatavisualisations
AT lichunbiu shapeawarestochasticneighborembeddingforrobustdatavisualisations