Cargando…

Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data

Arguably one of the most famous dimensionality reduction algorithms of today is t-distributed stochastic neighbor embedding (t-SNE). Although being widely used for the visualization of scRNA-seq data, it is prone to errors as any algorithm and may lead to inaccurate interpretations of the visualized...

Descripción completa

Detalles Bibliográficos
Autores principales: Ozgode Yigin, Busra, Saygili, Gorkem
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10121641/
https://www.ncbi.nlm.nih.gov/pubmed/37085593
http://dx.doi.org/10.1038/s41598-023-32966-x
_version_ 1785029410377695232
author Ozgode Yigin, Busra
Saygili, Gorkem
author_facet Ozgode Yigin, Busra
Saygili, Gorkem
author_sort Ozgode Yigin, Busra
collection PubMed
description Arguably one of the most famous dimensionality reduction algorithms of today is t-distributed stochastic neighbor embedding (t-SNE). Although being widely used for the visualization of scRNA-seq data, it is prone to errors as any algorithm and may lead to inaccurate interpretations of the visualized data. A reasonable way to avoid misinterpretations is to quantify the reliability of the visualizations. The focus of this work is first to find the best possible way to predict sample-based confidence scores for t-SNE embeddings and next, to use these confidence scores to improve the clustering algorithms. We adopt an RF regression algorithm using seven distance measures as features for having the sample-based confidence scores with a variety of different distance measures. The best configuration is used to assess the clustering improvement using K-means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) based on Adjusted Rank Index (ARI), Normalized Mutual Information (NMI), and accuracy (ACC) scores. The experimental results show that distance measures have a considerable effect on the precision of confidence scores and clustering performance can be improved substantially if these confidence scores are incorporated before the clustering algorithm. Our findings reveal the usefulness of these confidence scores on downstream analyses for scRNA-seq data.
format Online
Article
Text
id pubmed-10121641
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-101216412023-04-23 Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data Ozgode Yigin, Busra Saygili, Gorkem Sci Rep Article Arguably one of the most famous dimensionality reduction algorithms of today is t-distributed stochastic neighbor embedding (t-SNE). Although being widely used for the visualization of scRNA-seq data, it is prone to errors as any algorithm and may lead to inaccurate interpretations of the visualized data. A reasonable way to avoid misinterpretations is to quantify the reliability of the visualizations. The focus of this work is first to find the best possible way to predict sample-based confidence scores for t-SNE embeddings and next, to use these confidence scores to improve the clustering algorithms. We adopt an RF regression algorithm using seven distance measures as features for having the sample-based confidence scores with a variety of different distance measures. The best configuration is used to assess the clustering improvement using K-means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) based on Adjusted Rank Index (ARI), Normalized Mutual Information (NMI), and accuracy (ACC) scores. The experimental results show that distance measures have a considerable effect on the precision of confidence scores and clustering performance can be improved substantially if these confidence scores are incorporated before the clustering algorithm. Our findings reveal the usefulness of these confidence scores on downstream analyses for scRNA-seq data. Nature Publishing Group UK 2023-04-21 /pmc/articles/PMC10121641/ /pubmed/37085593 http://dx.doi.org/10.1038/s41598-023-32966-x Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Ozgode Yigin, Busra
Saygili, Gorkem
Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data
title Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data
title_full Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data
title_fullStr Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data
title_full_unstemmed Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data
title_short Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data
title_sort effect of distance measures on confidences of t-sne embeddings and its implications on clustering for scrna-seq data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10121641/
https://www.ncbi.nlm.nih.gov/pubmed/37085593
http://dx.doi.org/10.1038/s41598-023-32966-x
work_keys_str_mv AT ozgodeyiginbusra effectofdistancemeasuresonconfidencesoftsneembeddingsanditsimplicationsonclusteringforscrnaseqdata
AT saygiligorkem effectofdistancemeasuresonconfidencesoftsneembeddingsanditsimplicationsonclusteringforscrnaseqdata