Cargando…

A clustering method for small scRNA-seq data based on subspace and weighted distance

BACKGROUND: Identifying the cell types using unsupervised methods is essential for scRNA-seq research. However, conventional similarity measures introduce challenges to single-cell data clustering because of the high dimensional, high noise, and high dropout. METHODS: We proposed a clustering method...

Descripción completa

Detalles Bibliográficos
Autores principales: Ning, Zilan, Dai, Zhijun, Zhang, Hongyan, Chen, Yuan, Yuan, Zheming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9879162/
https://www.ncbi.nlm.nih.gov/pubmed/36710872
http://dx.doi.org/10.7717/peerj.14706
_version_ 1784878634900652032
author Ning, Zilan
Dai, Zhijun
Zhang, Hongyan
Chen, Yuan
Yuan, Zheming
author_facet Ning, Zilan
Dai, Zhijun
Zhang, Hongyan
Chen, Yuan
Yuan, Zheming
author_sort Ning, Zilan
collection PubMed
description BACKGROUND: Identifying the cell types using unsupervised methods is essential for scRNA-seq research. However, conventional similarity measures introduce challenges to single-cell data clustering because of the high dimensional, high noise, and high dropout. METHODS: We proposed a clustering method for small ScRNA-seq data based on Subspace and Weighted Distance (SSWD), which follows the assumption that the sets of gene subspace composed of similar density-distributing genes can better distinguish cell groups. To accurately capture the intrinsic relationship among cells or genes, a new distance metric that combines Euclidean and Pearson distance through a weighting strategy was proposed. The relative Calinski-Harabasz (CH) index was used to estimate the cluster numbers instead of the CH index because it is comparable across degrees of freedom. RESULTS: We compared SSWD with seven prevailing methods on eight publicly scRNA-seq datasets. The experimental results show that the SSWD has better clustering accuracy and the partitioning ability of cell groups. SSWD can be downloaded at https://github.com/ningzilan/SSWD.
format Online
Article
Text
id pubmed-9879162
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-98791622023-01-27 A clustering method for small scRNA-seq data based on subspace and weighted distance Ning, Zilan Dai, Zhijun Zhang, Hongyan Chen, Yuan Yuan, Zheming PeerJ Bioinformatics BACKGROUND: Identifying the cell types using unsupervised methods is essential for scRNA-seq research. However, conventional similarity measures introduce challenges to single-cell data clustering because of the high dimensional, high noise, and high dropout. METHODS: We proposed a clustering method for small ScRNA-seq data based on Subspace and Weighted Distance (SSWD), which follows the assumption that the sets of gene subspace composed of similar density-distributing genes can better distinguish cell groups. To accurately capture the intrinsic relationship among cells or genes, a new distance metric that combines Euclidean and Pearson distance through a weighting strategy was proposed. The relative Calinski-Harabasz (CH) index was used to estimate the cluster numbers instead of the CH index because it is comparable across degrees of freedom. RESULTS: We compared SSWD with seven prevailing methods on eight publicly scRNA-seq datasets. The experimental results show that the SSWD has better clustering accuracy and the partitioning ability of cell groups. SSWD can be downloaded at https://github.com/ningzilan/SSWD. PeerJ Inc. 2023-01-23 /pmc/articles/PMC9879162/ /pubmed/36710872 http://dx.doi.org/10.7717/peerj.14706 Text en ©2023 Ning et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Ning, Zilan
Dai, Zhijun
Zhang, Hongyan
Chen, Yuan
Yuan, Zheming
A clustering method for small scRNA-seq data based on subspace and weighted distance
title A clustering method for small scRNA-seq data based on subspace and weighted distance
title_full A clustering method for small scRNA-seq data based on subspace and weighted distance
title_fullStr A clustering method for small scRNA-seq data based on subspace and weighted distance
title_full_unstemmed A clustering method for small scRNA-seq data based on subspace and weighted distance
title_short A clustering method for small scRNA-seq data based on subspace and weighted distance
title_sort clustering method for small scrna-seq data based on subspace and weighted distance
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9879162/
https://www.ncbi.nlm.nih.gov/pubmed/36710872
http://dx.doi.org/10.7717/peerj.14706
work_keys_str_mv AT ningzilan aclusteringmethodforsmallscrnaseqdatabasedonsubspaceandweighteddistance
AT daizhijun aclusteringmethodforsmallscrnaseqdatabasedonsubspaceandweighteddistance
AT zhanghongyan aclusteringmethodforsmallscrnaseqdatabasedonsubspaceandweighteddistance
AT chenyuan aclusteringmethodforsmallscrnaseqdatabasedonsubspaceandweighteddistance
AT yuanzheming aclusteringmethodforsmallscrnaseqdatabasedonsubspaceandweighteddistance
AT ningzilan clusteringmethodforsmallscrnaseqdatabasedonsubspaceandweighteddistance
AT daizhijun clusteringmethodforsmallscrnaseqdatabasedonsubspaceandweighteddistance
AT zhanghongyan clusteringmethodforsmallscrnaseqdatabasedonsubspaceandweighteddistance
AT chenyuan clusteringmethodforsmallscrnaseqdatabasedonsubspaceandweighteddistance
AT yuanzheming clusteringmethodforsmallscrnaseqdatabasedonsubspaceandweighteddistance