Cargando…

A comparison of central‐tendency and interconnectivity approaches to clustering multivariate data with irregular structure

QUESTIONS: Most clustering methods assume data are structured as discrete hyperspheroidal clusters to be evaluated by measures of central tendency. If vegetation data do not conform to this model, then vegetation data may be clustered incorrectly. What are the implications for cluster stability and...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tozer, Mark, Keith, David
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley and Sons Inc. 2022
Materias:	Research Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9674469/ https://www.ncbi.nlm.nih.gov/pubmed/36415880 http://dx.doi.org/10.1002/ece3.9496

_version_	1784833159747076096
author	Tozer, Mark Keith, David
author_facet	Tozer, Mark Keith, David
author_sort	Tozer, Mark
collection	PubMed
description	QUESTIONS: Most clustering methods assume data are structured as discrete hyperspheroidal clusters to be evaluated by measures of central tendency. If vegetation data do not conform to this model, then vegetation data may be clustered incorrectly. What are the implications for cluster stability and evaluation if clusters are of irregular shape or density? LOCATION: Southeast Australia. METHODS: We define misplacement as the placement of a sample in a cluster other than (distinct from) its nearest neighbor and hypothesize that optimizing homogeneity incurs the cost of higher rates of misplacement. Chameleon is a graph‐theoretic algorithm that emphasizes interconnectivity and thus is sensitive to the shape and distribution of clusters. We contrasted its solutions with those of traditional nonhierarchical and hierarchical (agglomerative and divisive) approaches. RESULTS: Chameleon‐derived solutions had lower rates of misplacement and only marginally higher heterogeneity than those of k‐means in the range of 15–60 clusters, but their metrics converged with larger numbers of clusters. Solutions derived by agglomerative clustering had the best metrics (and divisive clustering the worst) but both produced inferior high‐level solutions to those of Chameleon by merging distantly‐related clusters. CONCLUSIONS: Graph‐theoretic algorithms, such as Chameleon, have an advantage over traditional algorithms when data exhibit discontinuities and variable structure, typically producing more stable solutions (due to lower rates of misplacement) but scoring lower on traditional metrics of central tendency. Advantages are less obvious in the partitioning of data from continuous gradients; however, graph‐based partitioning protocols facilitate the hierarchical integration of solutions.
format	Online Article Text
id	pubmed-9674469
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	John Wiley and Sons Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-96744692022-11-21 A comparison of central‐tendency and interconnectivity approaches to clustering multivariate data with irregular structure Tozer, Mark Keith, David Ecol Evol Research Articles QUESTIONS: Most clustering methods assume data are structured as discrete hyperspheroidal clusters to be evaluated by measures of central tendency. If vegetation data do not conform to this model, then vegetation data may be clustered incorrectly. What are the implications for cluster stability and evaluation if clusters are of irregular shape or density? LOCATION: Southeast Australia. METHODS: We define misplacement as the placement of a sample in a cluster other than (distinct from) its nearest neighbor and hypothesize that optimizing homogeneity incurs the cost of higher rates of misplacement. Chameleon is a graph‐theoretic algorithm that emphasizes interconnectivity and thus is sensitive to the shape and distribution of clusters. We contrasted its solutions with those of traditional nonhierarchical and hierarchical (agglomerative and divisive) approaches. RESULTS: Chameleon‐derived solutions had lower rates of misplacement and only marginally higher heterogeneity than those of k‐means in the range of 15–60 clusters, but their metrics converged with larger numbers of clusters. Solutions derived by agglomerative clustering had the best metrics (and divisive clustering the worst) but both produced inferior high‐level solutions to those of Chameleon by merging distantly‐related clusters. CONCLUSIONS: Graph‐theoretic algorithms, such as Chameleon, have an advantage over traditional algorithms when data exhibit discontinuities and variable structure, typically producing more stable solutions (due to lower rates of misplacement) but scoring lower on traditional metrics of central tendency. Advantages are less obvious in the partitioning of data from continuous gradients; however, graph‐based partitioning protocols facilitate the hierarchical integration of solutions. John Wiley and Sons Inc. 2022-11-18 /pmc/articles/PMC9674469/ /pubmed/36415880 http://dx.doi.org/10.1002/ece3.9496 Text en © 2022 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Articles Tozer, Mark Keith, David A comparison of central‐tendency and interconnectivity approaches to clustering multivariate data with irregular structure
title	A comparison of central‐tendency and interconnectivity approaches to clustering multivariate data with irregular structure
title_full	A comparison of central‐tendency and interconnectivity approaches to clustering multivariate data with irregular structure
title_fullStr	A comparison of central‐tendency and interconnectivity approaches to clustering multivariate data with irregular structure
title_full_unstemmed	A comparison of central‐tendency and interconnectivity approaches to clustering multivariate data with irregular structure
title_short	A comparison of central‐tendency and interconnectivity approaches to clustering multivariate data with irregular structure
title_sort	comparison of central‐tendency and interconnectivity approaches to clustering multivariate data with irregular structure
topic	Research Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9674469/ https://www.ncbi.nlm.nih.gov/pubmed/36415880 http://dx.doi.org/10.1002/ece3.9496
work_keys_str_mv	AT tozermark acomparisonofcentraltendencyandinterconnectivityapproachestoclusteringmultivariatedatawithirregularstructure AT keithdavid acomparisonofcentraltendencyandinterconnectivityapproachestoclusteringmultivariatedatawithirregularstructure AT tozermark comparisonofcentraltendencyandinterconnectivityapproachestoclusteringmultivariatedatawithirregularstructure AT keithdavid comparisonofcentraltendencyandinterconnectivityapproachestoclusteringmultivariatedatawithirregularstructure

A comparison of central‐tendency and interconnectivity approaches to clustering multivariate data with irregular structure

Ejemplares similares