Cargando…

Nonunique UPGMA clusterings of microsatellite markers

Agglomerative hierarchical clustering has become a common tool for the analysis and visualization of data, thus being present in a large amount of scientific research and predating all areas of bioinformatics and computational biology. In this work, we focus on a critical problem, the nonuniqueness...

Descripción completa

Detalles Bibliográficos
Autores principales: Segura-Alabart, Natàlia, Serratosa, Francesc, Gómez, Sergio, Fernández, Alberto
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9487649/
https://www.ncbi.nlm.nih.gov/pubmed/35915053
http://dx.doi.org/10.1093/bib/bbac312
_version_ 1784792497735598080
author Segura-Alabart, Natàlia
Serratosa, Francesc
Gómez, Sergio
Fernández, Alberto
author_facet Segura-Alabart, Natàlia
Serratosa, Francesc
Gómez, Sergio
Fernández, Alberto
author_sort Segura-Alabart, Natàlia
collection PubMed
description Agglomerative hierarchical clustering has become a common tool for the analysis and visualization of data, thus being present in a large amount of scientific research and predating all areas of bioinformatics and computational biology. In this work, we focus on a critical problem, the nonuniqueness of the clustering when there are tied distances, for which several solutions exist but are not implemented in most hierarchical clustering packages. We analyze the magnitude of this problem in one particular setting: the clustering of microsatellite markers using the Unweighted Pair-Group Method with Arithmetic Mean. To do so, we have calculated the fraction of publications at the Scopus database in which more than one hierarchical clustering is possible, showing that about 46% of the articles are affected. Additionally, to show the problem from a practical point of view, we selected two opposite examples of articles that have multiple solutions: one with two possible dendrograms, and the other with more than 2.5 million different possible hierarchical clusterings.
format Online
Article
Text
id pubmed-9487649
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-94876492022-09-21 Nonunique UPGMA clusterings of microsatellite markers Segura-Alabart, Natàlia Serratosa, Francesc Gómez, Sergio Fernández, Alberto Brief Bioinform Review Agglomerative hierarchical clustering has become a common tool for the analysis and visualization of data, thus being present in a large amount of scientific research and predating all areas of bioinformatics and computational biology. In this work, we focus on a critical problem, the nonuniqueness of the clustering when there are tied distances, for which several solutions exist but are not implemented in most hierarchical clustering packages. We analyze the magnitude of this problem in one particular setting: the clustering of microsatellite markers using the Unweighted Pair-Group Method with Arithmetic Mean. To do so, we have calculated the fraction of publications at the Scopus database in which more than one hierarchical clustering is possible, showing that about 46% of the articles are affected. Additionally, to show the problem from a practical point of view, we selected two opposite examples of articles that have multiple solutions: one with two possible dendrograms, and the other with more than 2.5 million different possible hierarchical clusterings. Oxford University Press 2022-08-01 /pmc/articles/PMC9487649/ /pubmed/35915053 http://dx.doi.org/10.1093/bib/bbac312 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Review
Segura-Alabart, Natàlia
Serratosa, Francesc
Gómez, Sergio
Fernández, Alberto
Nonunique UPGMA clusterings of microsatellite markers
title Nonunique UPGMA clusterings of microsatellite markers
title_full Nonunique UPGMA clusterings of microsatellite markers
title_fullStr Nonunique UPGMA clusterings of microsatellite markers
title_full_unstemmed Nonunique UPGMA clusterings of microsatellite markers
title_short Nonunique UPGMA clusterings of microsatellite markers
title_sort nonunique upgma clusterings of microsatellite markers
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9487649/
https://www.ncbi.nlm.nih.gov/pubmed/35915053
http://dx.doi.org/10.1093/bib/bbac312
work_keys_str_mv AT seguraalabartnatalia nonuniqueupgmaclusteringsofmicrosatellitemarkers
AT serratosafrancesc nonuniqueupgmaclusteringsofmicrosatellitemarkers
AT gomezsergio nonuniqueupgmaclusteringsofmicrosatellitemarkers
AT fernandezalberto nonuniqueupgmaclusteringsofmicrosatellitemarkers