Cargando…

Stability estimation for unsupervised clustering: A review

Cluster analysis remains one of the most challenging yet fundamental tasks in unsupervised learning. This is due in part to the fact that there are no labels or gold standards by which performance can be measured. Moreover, the wide range of clustering methods available is governed by different obje...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liu, Tianmou, Yu, Han, Blair, Rachael Hageman
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley & Sons, Inc. 2022
Materias:	Advanced Reviews
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9787023/ https://www.ncbi.nlm.nih.gov/pubmed/36583207 http://dx.doi.org/10.1002/wics.1575

_version_	1784858423864590336
author	Liu, Tianmou Yu, Han Blair, Rachael Hageman
author_facet	Liu, Tianmou Yu, Han Blair, Rachael Hageman
author_sort	Liu, Tianmou
collection	PubMed
description	Cluster analysis remains one of the most challenging yet fundamental tasks in unsupervised learning. This is due in part to the fact that there are no labels or gold standards by which performance can be measured. Moreover, the wide range of clustering methods available is governed by different objective functions, different parameters, and dissimilarity measures. The purpose of clustering is versatile, often playing critical roles in the early stages of exploratory data analysis and as an endpoint for knowledge and discovery. Thus, understanding the quality of a clustering is of critical importance. The concept of stability has emerged as a strategy for assessing the performance and reproducibility of data clustering. The key idea is to produce perturbed data sets that are very close to the original, and cluster them. If the clustering is stable, then the clusters from the original data will be preserved in the perturbed data clustering. The nature of the perturbation, and the methods for quantifying similarity between clusterings, are nontrivial, and ultimately what distinguishes many of the stability estimation methods apart. In this review, we provide an overview of the very active research area of cluster stability estimation and discuss some of the open questions and challenges that remain in the field. This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification.
format	Online Article Text
id	pubmed-9787023
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	John Wiley & Sons, Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-97870232022-12-27 Stability estimation for unsupervised clustering: A review Liu, Tianmou Yu, Han Blair, Rachael Hageman Wiley Interdiscip Rev Comput Stat Advanced Reviews Cluster analysis remains one of the most challenging yet fundamental tasks in unsupervised learning. This is due in part to the fact that there are no labels or gold standards by which performance can be measured. Moreover, the wide range of clustering methods available is governed by different objective functions, different parameters, and dissimilarity measures. The purpose of clustering is versatile, often playing critical roles in the early stages of exploratory data analysis and as an endpoint for knowledge and discovery. Thus, understanding the quality of a clustering is of critical importance. The concept of stability has emerged as a strategy for assessing the performance and reproducibility of data clustering. The key idea is to produce perturbed data sets that are very close to the original, and cluster them. If the clustering is stable, then the clusters from the original data will be preserved in the perturbed data clustering. The nature of the perturbation, and the methods for quantifying similarity between clusterings, are nontrivial, and ultimately what distinguishes many of the stability estimation methods apart. In this review, we provide an overview of the very active research area of cluster stability estimation and discuss some of the open questions and challenges that remain in the field. This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification. John Wiley & Sons, Inc. 2022-01-09 2022 /pmc/articles/PMC9787023/ /pubmed/36583207 http://dx.doi.org/10.1002/wics.1575 Text en © 2022 The Authors. WIREs Computational Statistics published by Wiley Periodicals LLC. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
spellingShingle	Advanced Reviews Liu, Tianmou Yu, Han Blair, Rachael Hageman Stability estimation for unsupervised clustering: A review
title	Stability estimation for unsupervised clustering: A review
title_full	Stability estimation for unsupervised clustering: A review
title_fullStr	Stability estimation for unsupervised clustering: A review
title_full_unstemmed	Stability estimation for unsupervised clustering: A review
title_short	Stability estimation for unsupervised clustering: A review
title_sort	stability estimation for unsupervised clustering: a review
topic	Advanced Reviews
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9787023/ https://www.ncbi.nlm.nih.gov/pubmed/36583207 http://dx.doi.org/10.1002/wics.1575
work_keys_str_mv	AT liutianmou stabilityestimationforunsupervisedclusteringareview AT yuhan stabilityestimationforunsupervisedclusteringareview AT blairrachaelhageman stabilityestimationforunsupervisedclusteringareview

Stability estimation for unsupervised clustering: A review

Ejemplares similares