Cargando…

Robust large-scale clustering based on correntropy

With the explosive growth of data, how to efficiently cluster large-scale unlabeled data has become an important issue that needs to be solved urgently. Especially in the face of large-scale real-world data, which contains a large number of complex distributions of noises and outliers, the research...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jin, Guodong, Gao, Jing, Tan, Lining
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9635755/ https://www.ncbi.nlm.nih.gov/pubmed/36331916 http://dx.doi.org/10.1371/journal.pone.0277012

_version_	1784824780317261824
author	Jin, Guodong Gao, Jing Tan, Lining
author_facet	Jin, Guodong Gao, Jing Tan, Lining
author_sort	Jin, Guodong
collection	PubMed
description	With the explosive growth of data, how to efficiently cluster large-scale unlabeled data has become an important issue that needs to be solved urgently. Especially in the face of large-scale real-world data, which contains a large number of complex distributions of noises and outliers, the research on robust large-scale real-world data clustering algorithms has become one of the hottest topics. In response to this issue, a robust large-scale clustering algorithm based on correntropy (RLSCC) is proposed in this paper, specifically, k-means is firstly applied to generated pseudo-labels which reduce input data scale of subsequent spectral clustering, then anchor graphs instead of full sample graphs are introduced into spectral clustering to obtain final clustering results based on pseudo-labels which further improve the efficiency. Therefore, RLSCC inherits the advantages of the effectiveness of k-means and spectral clustering while greatly reducing the computational complexity. Furthermore, correntropy is developed to suppress the influence of noises and outlier the real-world data on the robustness of clustering. Finally, extensive experiments were carried out on real-world datasets and noise datasets and the results show that compared with other state-of-the-art algorithms, RLSCC can improve efficiency and robustness greatly while maintaining comparable or even higher clustering effectiveness.
format	Online Article Text
id	pubmed-9635755
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-96357552022-11-05 Robust large-scale clustering based on correntropy Jin, Guodong Gao, Jing Tan, Lining PLoS One Research Article With the explosive growth of data, how to efficiently cluster large-scale unlabeled data has become an important issue that needs to be solved urgently. Especially in the face of large-scale real-world data, which contains a large number of complex distributions of noises and outliers, the research on robust large-scale real-world data clustering algorithms has become one of the hottest topics. In response to this issue, a robust large-scale clustering algorithm based on correntropy (RLSCC) is proposed in this paper, specifically, k-means is firstly applied to generated pseudo-labels which reduce input data scale of subsequent spectral clustering, then anchor graphs instead of full sample graphs are introduced into spectral clustering to obtain final clustering results based on pseudo-labels which further improve the efficiency. Therefore, RLSCC inherits the advantages of the effectiveness of k-means and spectral clustering while greatly reducing the computational complexity. Furthermore, correntropy is developed to suppress the influence of noises and outlier the real-world data on the robustness of clustering. Finally, extensive experiments were carried out on real-world datasets and noise datasets and the results show that compared with other state-of-the-art algorithms, RLSCC can improve efficiency and robustness greatly while maintaining comparable or even higher clustering effectiveness. Public Library of Science 2022-11-04 /pmc/articles/PMC9635755/ /pubmed/36331916 http://dx.doi.org/10.1371/journal.pone.0277012 Text en © 2022 Jin et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Jin, Guodong Gao, Jing Tan, Lining Robust large-scale clustering based on correntropy
title	Robust large-scale clustering based on correntropy
title_full	Robust large-scale clustering based on correntropy
title_fullStr	Robust large-scale clustering based on correntropy
title_full_unstemmed	Robust large-scale clustering based on correntropy
title_short	Robust large-scale clustering based on correntropy
title_sort	robust large-scale clustering based on correntropy
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9635755/ https://www.ncbi.nlm.nih.gov/pubmed/36331916 http://dx.doi.org/10.1371/journal.pone.0277012
work_keys_str_mv	AT jinguodong robustlargescaleclusteringbasedoncorrentropy AT gaojing robustlargescaleclusteringbasedoncorrentropy AT tanlining robustlargescaleclusteringbasedoncorrentropy

Robust large-scale clustering based on correntropy

Ejemplares similares