Cargando…

Generalising Ward’s Method for Use with Manhattan Distances

The claim that Ward’s linkage algorithm in hierarchical clustering is limited to use with Euclidean distances is investigated. In this paper, Ward’s clustering algorithm is generalised to use with l(1) norm or Manhattan distances. We argue that the generalisation of Ward’s linkage method to incorpor...

Descripción completa

Detalles Bibliográficos
Autores principales: Strauss, Trudie, von Maltitz, Michael Johan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5235383/
https://www.ncbi.nlm.nih.gov/pubmed/28085891
http://dx.doi.org/10.1371/journal.pone.0168288
_version_ 1782495146620747776
author Strauss, Trudie
von Maltitz, Michael Johan
author_facet Strauss, Trudie
von Maltitz, Michael Johan
author_sort Strauss, Trudie
collection PubMed
description The claim that Ward’s linkage algorithm in hierarchical clustering is limited to use with Euclidean distances is investigated. In this paper, Ward’s clustering algorithm is generalised to use with l(1) norm or Manhattan distances. We argue that the generalisation of Ward’s linkage method to incorporate Manhattan distances is theoretically sound and provide an example of where this method outperforms the method using Euclidean distances. As an application, we perform statistical analyses on languages using methods normally applied to biology and genetic classification. We aim to quantify differences in character traits between languages and use a statistical language signature based on relative bi-gram (sequence of two letters) frequencies to calculate a distance matrix between 32 Indo-European languages. We then use Ward’s method of hierarchical clustering to classify the languages, using the Euclidean distance and the Manhattan distance. Results obtained from using the different distance metrics are compared to show that the Ward’s algorithm characteristic of minimising intra-cluster variation and maximising inter-cluster variation is not violated when using the Manhattan metric.
format Online
Article
Text
id pubmed-5235383
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-52353832017-02-06 Generalising Ward’s Method for Use with Manhattan Distances Strauss, Trudie von Maltitz, Michael Johan PLoS One Research Article The claim that Ward’s linkage algorithm in hierarchical clustering is limited to use with Euclidean distances is investigated. In this paper, Ward’s clustering algorithm is generalised to use with l(1) norm or Manhattan distances. We argue that the generalisation of Ward’s linkage method to incorporate Manhattan distances is theoretically sound and provide an example of where this method outperforms the method using Euclidean distances. As an application, we perform statistical analyses on languages using methods normally applied to biology and genetic classification. We aim to quantify differences in character traits between languages and use a statistical language signature based on relative bi-gram (sequence of two letters) frequencies to calculate a distance matrix between 32 Indo-European languages. We then use Ward’s method of hierarchical clustering to classify the languages, using the Euclidean distance and the Manhattan distance. Results obtained from using the different distance metrics are compared to show that the Ward’s algorithm characteristic of minimising intra-cluster variation and maximising inter-cluster variation is not violated when using the Manhattan metric. Public Library of Science 2017-01-13 /pmc/articles/PMC5235383/ /pubmed/28085891 http://dx.doi.org/10.1371/journal.pone.0168288 Text en © 2017 Strauss, von Maltitz http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Strauss, Trudie
von Maltitz, Michael Johan
Generalising Ward’s Method for Use with Manhattan Distances
title Generalising Ward’s Method for Use with Manhattan Distances
title_full Generalising Ward’s Method for Use with Manhattan Distances
title_fullStr Generalising Ward’s Method for Use with Manhattan Distances
title_full_unstemmed Generalising Ward’s Method for Use with Manhattan Distances
title_short Generalising Ward’s Method for Use with Manhattan Distances
title_sort generalising ward’s method for use with manhattan distances
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5235383/
https://www.ncbi.nlm.nih.gov/pubmed/28085891
http://dx.doi.org/10.1371/journal.pone.0168288
work_keys_str_mv AT strausstrudie generalisingwardsmethodforusewithmanhattandistances
AT vonmaltitzmichaeljohan generalisingwardsmethodforusewithmanhattandistances