Cargando…
Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data
Arguably one of the most famous dimensionality reduction algorithms of today is t-distributed stochastic neighbor embedding (t-SNE). Although being widely used for the visualization of scRNA-seq data, it is prone to errors as any algorithm and may lead to inaccurate interpretations of the visualized...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10121641/ https://www.ncbi.nlm.nih.gov/pubmed/37085593 http://dx.doi.org/10.1038/s41598-023-32966-x |
_version_ | 1785029410377695232 |
---|---|
author | Ozgode Yigin, Busra Saygili, Gorkem |
author_facet | Ozgode Yigin, Busra Saygili, Gorkem |
author_sort | Ozgode Yigin, Busra |
collection | PubMed |
description | Arguably one of the most famous dimensionality reduction algorithms of today is t-distributed stochastic neighbor embedding (t-SNE). Although being widely used for the visualization of scRNA-seq data, it is prone to errors as any algorithm and may lead to inaccurate interpretations of the visualized data. A reasonable way to avoid misinterpretations is to quantify the reliability of the visualizations. The focus of this work is first to find the best possible way to predict sample-based confidence scores for t-SNE embeddings and next, to use these confidence scores to improve the clustering algorithms. We adopt an RF regression algorithm using seven distance measures as features for having the sample-based confidence scores with a variety of different distance measures. The best configuration is used to assess the clustering improvement using K-means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) based on Adjusted Rank Index (ARI), Normalized Mutual Information (NMI), and accuracy (ACC) scores. The experimental results show that distance measures have a considerable effect on the precision of confidence scores and clustering performance can be improved substantially if these confidence scores are incorporated before the clustering algorithm. Our findings reveal the usefulness of these confidence scores on downstream analyses for scRNA-seq data. |
format | Online Article Text |
id | pubmed-10121641 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-101216412023-04-23 Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data Ozgode Yigin, Busra Saygili, Gorkem Sci Rep Article Arguably one of the most famous dimensionality reduction algorithms of today is t-distributed stochastic neighbor embedding (t-SNE). Although being widely used for the visualization of scRNA-seq data, it is prone to errors as any algorithm and may lead to inaccurate interpretations of the visualized data. A reasonable way to avoid misinterpretations is to quantify the reliability of the visualizations. The focus of this work is first to find the best possible way to predict sample-based confidence scores for t-SNE embeddings and next, to use these confidence scores to improve the clustering algorithms. We adopt an RF regression algorithm using seven distance measures as features for having the sample-based confidence scores with a variety of different distance measures. The best configuration is used to assess the clustering improvement using K-means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) based on Adjusted Rank Index (ARI), Normalized Mutual Information (NMI), and accuracy (ACC) scores. The experimental results show that distance measures have a considerable effect on the precision of confidence scores and clustering performance can be improved substantially if these confidence scores are incorporated before the clustering algorithm. Our findings reveal the usefulness of these confidence scores on downstream analyses for scRNA-seq data. Nature Publishing Group UK 2023-04-21 /pmc/articles/PMC10121641/ /pubmed/37085593 http://dx.doi.org/10.1038/s41598-023-32966-x Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Ozgode Yigin, Busra Saygili, Gorkem Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data |
title | Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data |
title_full | Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data |
title_fullStr | Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data |
title_full_unstemmed | Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data |
title_short | Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data |
title_sort | effect of distance measures on confidences of t-sne embeddings and its implications on clustering for scrna-seq data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10121641/ https://www.ncbi.nlm.nih.gov/pubmed/37085593 http://dx.doi.org/10.1038/s41598-023-32966-x |
work_keys_str_mv | AT ozgodeyiginbusra effectofdistancemeasuresonconfidencesoftsneembeddingsanditsimplicationsonclusteringforscrnaseqdata AT saygiligorkem effectofdistancemeasuresonconfidencesoftsneembeddingsanditsimplicationsonclusteringforscrnaseqdata |