Cargando…

J-score: a robust measure of clustering accuracy

BACKGROUND: Clustering analysis discovers hidden structures in a data set by partitioning them into disjoint clusters. Robust accuracy measures that evaluate the goodness of clustering results are critical for algorithm development and model diagnosis. Common problems of clustering accuracy measures...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ahmadinejad, Navid, Chung, Yunro, Liu, Li
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2023
Materias:	Algorithms and Analysis of Algorithms
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10495964/ https://www.ncbi.nlm.nih.gov/pubmed/37705621 http://dx.doi.org/10.7717/peerj-cs.1545

_version_	1785105006425276416
author	Ahmadinejad, Navid Chung, Yunro Liu, Li
author_facet	Ahmadinejad, Navid Chung, Yunro Liu, Li
author_sort	Ahmadinejad, Navid
collection	PubMed
description	BACKGROUND: Clustering analysis discovers hidden structures in a data set by partitioning them into disjoint clusters. Robust accuracy measures that evaluate the goodness of clustering results are critical for algorithm development and model diagnosis. Common problems of clustering accuracy measures include overlooking unmatched clusters, biases towards excessive clusters, unstable baselines, and difficulties of interpretation. In this study, we presented a novel accuracy measure, J-score, to address these issues. METHODS: Given a data set with known class labels, J-score quantifies how well the hypothetical clusters produced by clustering analysis recover the true classes. It starts with bidirectional set matching to identify the correspondence between true classes and hypothetical clusters based on Jaccard index. It then computes two weighted sums of Jaccard indices measuring the reconciliation from classes to clusters and vice versa. The final J-score is the harmonic mean of the two weighted sums. RESULTS: Through simulation studies and analyses of real data sets, we evaluated the performance of J-score and compared with existing measures. Our results show that J-score is effective in distinguishing partition structures that differ only by unmatched clusters, rewarding correct inference of class numbers, addressing biases towards excessive clusters, and having a relatively stable baseline. The simplicity of its calculation makes the interpretation straightforward. It is a valuable tool complementary to other accuracy measures. We released an R/jScore package implementing the algorithm.
format	Online Article Text
id	pubmed-10495964
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-104959642023-09-13 J-score: a robust measure of clustering accuracy Ahmadinejad, Navid Chung, Yunro Liu, Li PeerJ Comput Sci Algorithms and Analysis of Algorithms BACKGROUND: Clustering analysis discovers hidden structures in a data set by partitioning them into disjoint clusters. Robust accuracy measures that evaluate the goodness of clustering results are critical for algorithm development and model diagnosis. Common problems of clustering accuracy measures include overlooking unmatched clusters, biases towards excessive clusters, unstable baselines, and difficulties of interpretation. In this study, we presented a novel accuracy measure, J-score, to address these issues. METHODS: Given a data set with known class labels, J-score quantifies how well the hypothetical clusters produced by clustering analysis recover the true classes. It starts with bidirectional set matching to identify the correspondence between true classes and hypothetical clusters based on Jaccard index. It then computes two weighted sums of Jaccard indices measuring the reconciliation from classes to clusters and vice versa. The final J-score is the harmonic mean of the two weighted sums. RESULTS: Through simulation studies and analyses of real data sets, we evaluated the performance of J-score and compared with existing measures. Our results show that J-score is effective in distinguishing partition structures that differ only by unmatched clusters, rewarding correct inference of class numbers, addressing biases towards excessive clusters, and having a relatively stable baseline. The simplicity of its calculation makes the interpretation straightforward. It is a valuable tool complementary to other accuracy measures. We released an R/jScore package implementing the algorithm. PeerJ Inc. 2023-09-04 /pmc/articles/PMC10495964/ /pubmed/37705621 http://dx.doi.org/10.7717/peerj-cs.1545 Text en ©2023 Ahmadinejad et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Algorithms and Analysis of Algorithms Ahmadinejad, Navid Chung, Yunro Liu, Li J-score: a robust measure of clustering accuracy
title	J-score: a robust measure of clustering accuracy
title_full	J-score: a robust measure of clustering accuracy
title_fullStr	J-score: a robust measure of clustering accuracy
title_full_unstemmed	J-score: a robust measure of clustering accuracy
title_short	J-score: a robust measure of clustering accuracy
title_sort	j-score: a robust measure of clustering accuracy
topic	Algorithms and Analysis of Algorithms
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10495964/ https://www.ncbi.nlm.nih.gov/pubmed/37705621 http://dx.doi.org/10.7717/peerj-cs.1545
work_keys_str_mv	AT ahmadinejadnavid jscorearobustmeasureofclusteringaccuracy AT chungyunro jscorearobustmeasureofclusteringaccuracy AT liuli jscorearobustmeasureofclusteringaccuracy

J-score: a robust measure of clustering accuracy

Ejemplares similares