Cargando…
Shape-aware stochastic neighbor embedding for robust data visualisations
BACKGROUND: The t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm has emerged as one of the leading methods for visualising high-dimensional (HD) data in a wide variety of fields, especially for revealing cluster structure in HD single-cell transcriptomics data. However, t-SNE often fail...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9660178/ https://www.ncbi.nlm.nih.gov/pubmed/36376789 http://dx.doi.org/10.1186/s12859-022-05028-8 |
_version_ | 1784830366159208448 |
---|---|
author | Wängberg, Tobias Tyrcha, Joanna Li, Chun-Biu |
author_facet | Wängberg, Tobias Tyrcha, Joanna Li, Chun-Biu |
author_sort | Wängberg, Tobias |
collection | PubMed |
description | BACKGROUND: The t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm has emerged as one of the leading methods for visualising high-dimensional (HD) data in a wide variety of fields, especially for revealing cluster structure in HD single-cell transcriptomics data. However, t-SNE often fails to correctly represent hierarchical relationships between clusters and creates spurious patterns in the embedding. In this work we generalised t-SNE using shape-aware graph distances to mitigate some of the limitations of the t-SNE. Although many methods have been recently proposed to circumvent the shortcomings of t-SNE, notably Uniform manifold approximation (UMAP) and Potential of heat diffusion for affinity-based transition embedding (PHATE), we see a clear advantage of the proposed graph-based method. RESULTS: The superior performance of the proposed method is first demonstrated on simulated data, where a significant improvement compared to t-SNE, UMAP and PHATE, based on quantitative validation indices, is observed when visualising imbalanced, nonlinear, continuous and hierarchically structured data. Thereafter the ability of the proposed method compared to the competing methods to create faithfully low-dimensional embeddings is shown on two real-world data sets, the single-cell transcriptomics data and the MNIST image data. In addition, the only hyper-parameter of the method can be automatically chosen in a data-driven way, which is consistently optimal across all test cases in this study. CONCLUSIONS: In this work we show that the proposed shape-aware stochastic neighbor embedding method creates low-dimensional visualisations that robustly and accurately reveal key structures of high-dimensional data. |
format | Online Article Text |
id | pubmed-9660178 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-96601782022-11-14 Shape-aware stochastic neighbor embedding for robust data visualisations Wängberg, Tobias Tyrcha, Joanna Li, Chun-Biu BMC Bioinformatics Research BACKGROUND: The t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm has emerged as one of the leading methods for visualising high-dimensional (HD) data in a wide variety of fields, especially for revealing cluster structure in HD single-cell transcriptomics data. However, t-SNE often fails to correctly represent hierarchical relationships between clusters and creates spurious patterns in the embedding. In this work we generalised t-SNE using shape-aware graph distances to mitigate some of the limitations of the t-SNE. Although many methods have been recently proposed to circumvent the shortcomings of t-SNE, notably Uniform manifold approximation (UMAP) and Potential of heat diffusion for affinity-based transition embedding (PHATE), we see a clear advantage of the proposed graph-based method. RESULTS: The superior performance of the proposed method is first demonstrated on simulated data, where a significant improvement compared to t-SNE, UMAP and PHATE, based on quantitative validation indices, is observed when visualising imbalanced, nonlinear, continuous and hierarchically structured data. Thereafter the ability of the proposed method compared to the competing methods to create faithfully low-dimensional embeddings is shown on two real-world data sets, the single-cell transcriptomics data and the MNIST image data. In addition, the only hyper-parameter of the method can be automatically chosen in a data-driven way, which is consistently optimal across all test cases in this study. CONCLUSIONS: In this work we show that the proposed shape-aware stochastic neighbor embedding method creates low-dimensional visualisations that robustly and accurately reveal key structures of high-dimensional data. BioMed Central 2022-11-14 /pmc/articles/PMC9660178/ /pubmed/36376789 http://dx.doi.org/10.1186/s12859-022-05028-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Wängberg, Tobias Tyrcha, Joanna Li, Chun-Biu Shape-aware stochastic neighbor embedding for robust data visualisations |
title | Shape-aware stochastic neighbor embedding for robust data visualisations |
title_full | Shape-aware stochastic neighbor embedding for robust data visualisations |
title_fullStr | Shape-aware stochastic neighbor embedding for robust data visualisations |
title_full_unstemmed | Shape-aware stochastic neighbor embedding for robust data visualisations |
title_short | Shape-aware stochastic neighbor embedding for robust data visualisations |
title_sort | shape-aware stochastic neighbor embedding for robust data visualisations |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9660178/ https://www.ncbi.nlm.nih.gov/pubmed/36376789 http://dx.doi.org/10.1186/s12859-022-05028-8 |
work_keys_str_mv | AT wangbergtobias shapeawarestochasticneighborembeddingforrobustdatavisualisations AT tyrchajoanna shapeawarestochasticneighborembeddingforrobustdatavisualisations AT lichunbiu shapeawarestochasticneighborembeddingforrobustdatavisualisations |