Cargando…

D3K: The Dissimilarity-Density-Dynamic Radius K-means Clustering Algorithm for scRNA-Seq Data

A single-cell sequencing data set has always been a challenge for clustering because of its high dimension and multi-noise points. The traditional K-means algorithm is not suitable for this type of data. Therefore, this study proposes a Dissimilarity-Density-Dynamic Radius-K-means clustering algorit...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Guoyun, Li, Manzhi, Wang, Hongtao, Lin, Shijun, Xu, Junlin, Li, Ruixi, Tang, Min, Li, Chun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9284269/
https://www.ncbi.nlm.nih.gov/pubmed/35846121
http://dx.doi.org/10.3389/fgene.2022.912711
_version_ 1784747526227755008
author Liu, Guoyun
Li, Manzhi
Wang, Hongtao
Lin, Shijun
Xu, Junlin
Li, Ruixi
Tang, Min
Li, Chun
author_facet Liu, Guoyun
Li, Manzhi
Wang, Hongtao
Lin, Shijun
Xu, Junlin
Li, Ruixi
Tang, Min
Li, Chun
author_sort Liu, Guoyun
collection PubMed
description A single-cell sequencing data set has always been a challenge for clustering because of its high dimension and multi-noise points. The traditional K-means algorithm is not suitable for this type of data. Therefore, this study proposes a Dissimilarity-Density-Dynamic Radius-K-means clustering algorithm. The algorithm adds the dynamic radius parameter to the calculation. It flexibly adjusts the active radius according to the data characteristics, which can eliminate the influence of noise points and optimize the clustering results. At the same time, the algorithm calculates the weight through the dissimilarity density of the data set, the average contrast of candidate clusters, and the dissimilarity of candidate clusters. It obtains a set of high-quality initial center points, which solves the randomness of the K-means algorithm in selecting the center points. Finally, compared with similar algorithms, this algorithm shows a better clustering effect on single-cell data. Each clustering index is higher than other single-cell clustering algorithms, which overcomes the shortcomings of the traditional K-means algorithm.
format Online
Article
Text
id pubmed-9284269
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-92842692022-07-16 D3K: The Dissimilarity-Density-Dynamic Radius K-means Clustering Algorithm for scRNA-Seq Data Liu, Guoyun Li, Manzhi Wang, Hongtao Lin, Shijun Xu, Junlin Li, Ruixi Tang, Min Li, Chun Front Genet Genetics A single-cell sequencing data set has always been a challenge for clustering because of its high dimension and multi-noise points. The traditional K-means algorithm is not suitable for this type of data. Therefore, this study proposes a Dissimilarity-Density-Dynamic Radius-K-means clustering algorithm. The algorithm adds the dynamic radius parameter to the calculation. It flexibly adjusts the active radius according to the data characteristics, which can eliminate the influence of noise points and optimize the clustering results. At the same time, the algorithm calculates the weight through the dissimilarity density of the data set, the average contrast of candidate clusters, and the dissimilarity of candidate clusters. It obtains a set of high-quality initial center points, which solves the randomness of the K-means algorithm in selecting the center points. Finally, compared with similar algorithms, this algorithm shows a better clustering effect on single-cell data. Each clustering index is higher than other single-cell clustering algorithms, which overcomes the shortcomings of the traditional K-means algorithm. Frontiers Media S.A. 2022-07-01 /pmc/articles/PMC9284269/ /pubmed/35846121 http://dx.doi.org/10.3389/fgene.2022.912711 Text en Copyright © 2022 Liu, Li, Wang, Lin, Xu, Li, Tang and Li. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Liu, Guoyun
Li, Manzhi
Wang, Hongtao
Lin, Shijun
Xu, Junlin
Li, Ruixi
Tang, Min
Li, Chun
D3K: The Dissimilarity-Density-Dynamic Radius K-means Clustering Algorithm for scRNA-Seq Data
title D3K: The Dissimilarity-Density-Dynamic Radius K-means Clustering Algorithm for scRNA-Seq Data
title_full D3K: The Dissimilarity-Density-Dynamic Radius K-means Clustering Algorithm for scRNA-Seq Data
title_fullStr D3K: The Dissimilarity-Density-Dynamic Radius K-means Clustering Algorithm for scRNA-Seq Data
title_full_unstemmed D3K: The Dissimilarity-Density-Dynamic Radius K-means Clustering Algorithm for scRNA-Seq Data
title_short D3K: The Dissimilarity-Density-Dynamic Radius K-means Clustering Algorithm for scRNA-Seq Data
title_sort d3k: the dissimilarity-density-dynamic radius k-means clustering algorithm for scrna-seq data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9284269/
https://www.ncbi.nlm.nih.gov/pubmed/35846121
http://dx.doi.org/10.3389/fgene.2022.912711
work_keys_str_mv AT liuguoyun d3kthedissimilaritydensitydynamicradiuskmeansclusteringalgorithmforscrnaseqdata
AT limanzhi d3kthedissimilaritydensitydynamicradiuskmeansclusteringalgorithmforscrnaseqdata
AT wanghongtao d3kthedissimilaritydensitydynamicradiuskmeansclusteringalgorithmforscrnaseqdata
AT linshijun d3kthedissimilaritydensitydynamicradiuskmeansclusteringalgorithmforscrnaseqdata
AT xujunlin d3kthedissimilaritydensitydynamicradiuskmeansclusteringalgorithmforscrnaseqdata
AT liruixi d3kthedissimilaritydensitydynamicradiuskmeansclusteringalgorithmforscrnaseqdata
AT tangmin d3kthedissimilaritydensitydynamicradiuskmeansclusteringalgorithmforscrnaseqdata
AT lichun d3kthedissimilaritydensitydynamicradiuskmeansclusteringalgorithmforscrnaseqdata