Cargando…

DDCAL: Evenly Distributing Data into Low Variance Clusters Based on Iterative Feature Scaling

This work studies the problem of clustering one-dimensional data points such that they are evenly distributed over a given number of low variance clusters. One application is the visualization of data on choropleth maps or on business process models, but without over-emphasizing outliers. This enabl...

Descripción completa

Detalles Bibliográficos
Autores principales: Lux, Marian, Rinderle-Ma, Stefanie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9873542/
https://www.ncbi.nlm.nih.gov/pubmed/36713890
http://dx.doi.org/10.1007/s00357-022-09428-6
_version_ 1784877619924172800
author Lux, Marian
Rinderle-Ma, Stefanie
author_facet Lux, Marian
Rinderle-Ma, Stefanie
author_sort Lux, Marian
collection PubMed
description This work studies the problem of clustering one-dimensional data points such that they are evenly distributed over a given number of low variance clusters. One application is the visualization of data on choropleth maps or on business process models, but without over-emphasizing outliers. This enables the detection and differentiation of smaller clusters. The problem is tackled based on a heuristic algorithm called DDCAL (1d distribution cluster algorithm) that is based on iterative feature scaling which generates stable results of clusters. The effectiveness of the DDCAL algorithm is shown based on 5 artificial data sets with different distributions and 4 real-world data sets reflecting different use cases. Moreover, the results from DDCAL, by using these data sets, are compared to 11 existing clustering algorithms. The application of the DDCAL algorithm is illustrated through the visualization of pandemic and population data on choropleth maps as well as process mining results on process models.
format Online
Article
Text
id pubmed-9873542
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-98735422023-01-25 DDCAL: Evenly Distributing Data into Low Variance Clusters Based on Iterative Feature Scaling Lux, Marian Rinderle-Ma, Stefanie J Classif Article This work studies the problem of clustering one-dimensional data points such that they are evenly distributed over a given number of low variance clusters. One application is the visualization of data on choropleth maps or on business process models, but without over-emphasizing outliers. This enables the detection and differentiation of smaller clusters. The problem is tackled based on a heuristic algorithm called DDCAL (1d distribution cluster algorithm) that is based on iterative feature scaling which generates stable results of clusters. The effectiveness of the DDCAL algorithm is shown based on 5 artificial data sets with different distributions and 4 real-world data sets reflecting different use cases. Moreover, the results from DDCAL, by using these data sets, are compared to 11 existing clustering algorithms. The application of the DDCAL algorithm is illustrated through the visualization of pandemic and population data on choropleth maps as well as process mining results on process models. Springer US 2023-01-25 2023 /pmc/articles/PMC9873542/ /pubmed/36713890 http://dx.doi.org/10.1007/s00357-022-09428-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Lux, Marian
Rinderle-Ma, Stefanie
DDCAL: Evenly Distributing Data into Low Variance Clusters Based on Iterative Feature Scaling
title DDCAL: Evenly Distributing Data into Low Variance Clusters Based on Iterative Feature Scaling
title_full DDCAL: Evenly Distributing Data into Low Variance Clusters Based on Iterative Feature Scaling
title_fullStr DDCAL: Evenly Distributing Data into Low Variance Clusters Based on Iterative Feature Scaling
title_full_unstemmed DDCAL: Evenly Distributing Data into Low Variance Clusters Based on Iterative Feature Scaling
title_short DDCAL: Evenly Distributing Data into Low Variance Clusters Based on Iterative Feature Scaling
title_sort ddcal: evenly distributing data into low variance clusters based on iterative feature scaling
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9873542/
https://www.ncbi.nlm.nih.gov/pubmed/36713890
http://dx.doi.org/10.1007/s00357-022-09428-6
work_keys_str_mv AT luxmarian ddcalevenlydistributingdataintolowvarianceclustersbasedoniterativefeaturescaling
AT rinderlemastefanie ddcalevenlydistributingdataintolowvarianceclustersbasedoniterativefeaturescaling