Cargando…

Clustering benchmark datasets exploiting the fundamental clustering problems

The Fundamental Clustering Problems Suite (FCPS) offers a variety of clustering challenges that any algorithm should be able to handle given real-world data. The FCPS consists of datasets with known a priori classifications that are to be reproduced by the algorithm. The datasets are intentionally c...

Descripción completa

Detalles Bibliográficos
Autores principales: Thrun, Michael C., Ultsch, Alfred
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7195520/
https://www.ncbi.nlm.nih.gov/pubmed/32373681
http://dx.doi.org/10.1016/j.dib.2020.105501
_version_ 1783528553295380480
author Thrun, Michael C.
Ultsch, Alfred
author_facet Thrun, Michael C.
Ultsch, Alfred
author_sort Thrun, Michael C.
collection PubMed
description The Fundamental Clustering Problems Suite (FCPS) offers a variety of clustering challenges that any algorithm should be able to handle given real-world data. The FCPS consists of datasets with known a priori classifications that are to be reproduced by the algorithm. The datasets are intentionally created to be visualized in two or three dimensions under the hypothesis that objects can be grouped unambiguously by the human eye. Each dataset represents a certain problem that can be solved by known clustering algorithms with varying success. In the R package “Fundamental Clustering Problems Suite” on CRAN, user-defined sample sizes can be drawn for the FCPS. Additionally, the distances of two high-dimensional datasets called Leukemia and Tetragonula are provided here. This collection is useful for investigating the shortcomings of clustering algorithms and the limitations of dimensionality reduction methods in the case of three-dimensional or higher datasets. This article is a simultaneous co-submission with Swarm Intelligence for Self-Organized Clustering [1].
format Online
Article
Text
id pubmed-7195520
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-71955202020-05-05 Clustering benchmark datasets exploiting the fundamental clustering problems Thrun, Michael C. Ultsch, Alfred Data Brief Computer Science The Fundamental Clustering Problems Suite (FCPS) offers a variety of clustering challenges that any algorithm should be able to handle given real-world data. The FCPS consists of datasets with known a priori classifications that are to be reproduced by the algorithm. The datasets are intentionally created to be visualized in two or three dimensions under the hypothesis that objects can be grouped unambiguously by the human eye. Each dataset represents a certain problem that can be solved by known clustering algorithms with varying success. In the R package “Fundamental Clustering Problems Suite” on CRAN, user-defined sample sizes can be drawn for the FCPS. Additionally, the distances of two high-dimensional datasets called Leukemia and Tetragonula are provided here. This collection is useful for investigating the shortcomings of clustering algorithms and the limitations of dimensionality reduction methods in the case of three-dimensional or higher datasets. This article is a simultaneous co-submission with Swarm Intelligence for Self-Organized Clustering [1]. Elsevier 2020-04-20 /pmc/articles/PMC7195520/ /pubmed/32373681 http://dx.doi.org/10.1016/j.dib.2020.105501 Text en © 2020 The Author(s). Published by Elsevier Inc. http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Computer Science
Thrun, Michael C.
Ultsch, Alfred
Clustering benchmark datasets exploiting the fundamental clustering problems
title Clustering benchmark datasets exploiting the fundamental clustering problems
title_full Clustering benchmark datasets exploiting the fundamental clustering problems
title_fullStr Clustering benchmark datasets exploiting the fundamental clustering problems
title_full_unstemmed Clustering benchmark datasets exploiting the fundamental clustering problems
title_short Clustering benchmark datasets exploiting the fundamental clustering problems
title_sort clustering benchmark datasets exploiting the fundamental clustering problems
topic Computer Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7195520/
https://www.ncbi.nlm.nih.gov/pubmed/32373681
http://dx.doi.org/10.1016/j.dib.2020.105501
work_keys_str_mv AT thrunmichaelc clusteringbenchmarkdatasetsexploitingthefundamentalclusteringproblems
AT ultschalfred clusteringbenchmarkdatasetsexploitingthefundamentalclusteringproblems