Cargando…

Beyond central‐tendency: If we agree discrete vegetation communities do not exist, should we investigate other methods of clustering?

Clustering is indispensable in the quest for robust vegetation classification schemes that aim to partition, summarise and communicate patterns. However, clustering solutions are sensitive to methods and data and are therefore unstable, a feature that is usually attributed to noise. Viewed through a...

Descripción completa

Detalles Bibliográficos
Autores principales: Tozer, Mark G., Keith, David A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10659940/
https://www.ncbi.nlm.nih.gov/pubmed/38020702
http://dx.doi.org/10.1002/ece3.10757
_version_ 1785137656197283840
author Tozer, Mark G.
Keith, David A.
author_facet Tozer, Mark G.
Keith, David A.
author_sort Tozer, Mark G.
collection PubMed
description Clustering is indispensable in the quest for robust vegetation classification schemes that aim to partition, summarise and communicate patterns. However, clustering solutions are sensitive to methods and data and are therefore unstable, a feature that is usually attributed to noise. Viewed through a central‐tendency lens, noise is defined as the degree of departure from type, which is problematic since vegetation types are abstractions of continua, and so noise can only be quantified relative to the particular solution at hand. Graph theory models the structure of vegetation data based on the interconnectivity of samples. Through a graph‐theoretic lens, the causes of instability can be quantified in absolute terms via the degree of connectivity among objects. We simulated incremental increases in sampling intensity in a dataset over five iterations and assessed classification stability across successive solutions derived using algorithms implementing, respectively, models of central‐tendency and interconnectivity. We used logistic regression to model the likelihood of a sample changing groups between iterations as a function of distance to the centroid and degree of interconnectivity. Our results show that the degree to which samples are interconnected is a more powerful predictor of instability than the degree to which they deviate from their nearest centroid. The removal of weakly interconnected samples resulted in more stable classifications, although solutions with many clusters were apparently inherently less stable than those with few clusters, and improvements in stability flowing from the removal of outliers declined as the number of clusters increased. Our results reinforce the fact that clusters abstracted from continuous data are inherently unstable and that the quest for stable, fine‐scale classifications from large regional datasets is illusory. Nevertheless, our results show that using models better suited to the analysis of continuous data may yield more stable classifications of the available data.
format Online
Article
Text
id pubmed-10659940
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-106599402023-11-01 Beyond central‐tendency: If we agree discrete vegetation communities do not exist, should we investigate other methods of clustering? Tozer, Mark G. Keith, David A. Ecol Evol Research Articles Clustering is indispensable in the quest for robust vegetation classification schemes that aim to partition, summarise and communicate patterns. However, clustering solutions are sensitive to methods and data and are therefore unstable, a feature that is usually attributed to noise. Viewed through a central‐tendency lens, noise is defined as the degree of departure from type, which is problematic since vegetation types are abstractions of continua, and so noise can only be quantified relative to the particular solution at hand. Graph theory models the structure of vegetation data based on the interconnectivity of samples. Through a graph‐theoretic lens, the causes of instability can be quantified in absolute terms via the degree of connectivity among objects. We simulated incremental increases in sampling intensity in a dataset over five iterations and assessed classification stability across successive solutions derived using algorithms implementing, respectively, models of central‐tendency and interconnectivity. We used logistic regression to model the likelihood of a sample changing groups between iterations as a function of distance to the centroid and degree of interconnectivity. Our results show that the degree to which samples are interconnected is a more powerful predictor of instability than the degree to which they deviate from their nearest centroid. The removal of weakly interconnected samples resulted in more stable classifications, although solutions with many clusters were apparently inherently less stable than those with few clusters, and improvements in stability flowing from the removal of outliers declined as the number of clusters increased. Our results reinforce the fact that clusters abstracted from continuous data are inherently unstable and that the quest for stable, fine‐scale classifications from large regional datasets is illusory. Nevertheless, our results show that using models better suited to the analysis of continuous data may yield more stable classifications of the available data. John Wiley and Sons Inc. 2023-11-20 /pmc/articles/PMC10659940/ /pubmed/38020702 http://dx.doi.org/10.1002/ece3.10757 Text en © 2023 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Articles
Tozer, Mark G.
Keith, David A.
Beyond central‐tendency: If we agree discrete vegetation communities do not exist, should we investigate other methods of clustering?
title Beyond central‐tendency: If we agree discrete vegetation communities do not exist, should we investigate other methods of clustering?
title_full Beyond central‐tendency: If we agree discrete vegetation communities do not exist, should we investigate other methods of clustering?
title_fullStr Beyond central‐tendency: If we agree discrete vegetation communities do not exist, should we investigate other methods of clustering?
title_full_unstemmed Beyond central‐tendency: If we agree discrete vegetation communities do not exist, should we investigate other methods of clustering?
title_short Beyond central‐tendency: If we agree discrete vegetation communities do not exist, should we investigate other methods of clustering?
title_sort beyond central‐tendency: if we agree discrete vegetation communities do not exist, should we investigate other methods of clustering?
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10659940/
https://www.ncbi.nlm.nih.gov/pubmed/38020702
http://dx.doi.org/10.1002/ece3.10757
work_keys_str_mv AT tozermarkg beyondcentraltendencyifweagreediscretevegetationcommunitiesdonotexistshouldweinvestigateothermethodsofclustering
AT keithdavida beyondcentraltendencyifweagreediscretevegetationcommunitiesdonotexistshouldweinvestigateothermethodsofclustering