Cargando…

Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data

Arguably one of the most famous dimensionality reduction algorithms of today is t-distributed stochastic neighbor embedding (t-SNE). Although being widely used for the visualization of scRNA-seq data, it is prone to errors as any algorithm and may lead to inaccurate interpretations of the visualized...

Descripción completa

Detalles Bibliográficos
Autores principales: Ozgode Yigin, Busra, Saygili, Gorkem
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10121641/
https://www.ncbi.nlm.nih.gov/pubmed/37085593
http://dx.doi.org/10.1038/s41598-023-32966-x
Descripción
Sumario:Arguably one of the most famous dimensionality reduction algorithms of today is t-distributed stochastic neighbor embedding (t-SNE). Although being widely used for the visualization of scRNA-seq data, it is prone to errors as any algorithm and may lead to inaccurate interpretations of the visualized data. A reasonable way to avoid misinterpretations is to quantify the reliability of the visualizations. The focus of this work is first to find the best possible way to predict sample-based confidence scores for t-SNE embeddings and next, to use these confidence scores to improve the clustering algorithms. We adopt an RF regression algorithm using seven distance measures as features for having the sample-based confidence scores with a variety of different distance measures. The best configuration is used to assess the clustering improvement using K-means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) based on Adjusted Rank Index (ARI), Normalized Mutual Information (NMI), and accuracy (ACC) scores. The experimental results show that distance measures have a considerable effect on the precision of confidence scores and clustering performance can be improved substantially if these confidence scores are incorporated before the clustering algorithm. Our findings reveal the usefulness of these confidence scores on downstream analyses for scRNA-seq data.