Cargando…

An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems

Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampl...

Descripción completa

Detalles Bibliográficos
Autores principales: Dawson, Kevin J., Belkhir, Khalid
Formato: Texto
Lenguaje:English
Publicado: 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2705916/
https://www.ncbi.nlm.nih.gov/pubmed/19337306
http://dx.doi.org/10.1038/hdy.2009.29
_version_ 1782169041099554816
author Dawson, Kevin J.
Belkhir, Khalid
author_facet Dawson, Kevin J.
Belkhir, Khalid
author_sort Dawson, Kevin J.
collection PubMed
description Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals, - the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. Since the number of possible partitions grows very rapidly with the sample size, we can not visualise this probability distribution in its entirety, unless the sample is very small. As a solution to this visualisation problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package Partition View. The exact linkage algorithm takes the posterior co-assignment probabilities as input, and yields as output a rooted binary tree, - or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities.
format Text
id pubmed-2705916
institution National Center for Biotechnology Information
language English
publishDate 2009
record_format MEDLINE/PubMed
spelling pubmed-27059162010-01-01 An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems Dawson, Kevin J. Belkhir, Khalid Heredity (Edinb) Article Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals, - the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. Since the number of possible partitions grows very rapidly with the sample size, we can not visualise this probability distribution in its entirety, unless the sample is very small. As a solution to this visualisation problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package Partition View. The exact linkage algorithm takes the posterior co-assignment probabilities as input, and yields as output a rooted binary tree, - or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities. 2009-04-01 2009-07 /pmc/articles/PMC2705916/ /pubmed/19337306 http://dx.doi.org/10.1038/hdy.2009.29 Text en
spellingShingle Article
Dawson, Kevin J.
Belkhir, Khalid
An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems
title An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems
title_full An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems
title_fullStr An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems
title_full_unstemmed An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems
title_short An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems
title_sort agglomerative hierarchical clustering approach to visualisation in bayesian clustering problems
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2705916/
https://www.ncbi.nlm.nih.gov/pubmed/19337306
http://dx.doi.org/10.1038/hdy.2009.29
work_keys_str_mv AT dawsonkevinj anagglomerativehierarchicalclusteringapproachtovisualisationinbayesianclusteringproblems
AT belkhirkhalid anagglomerativehierarchicalclusteringapproachtovisualisationinbayesianclusteringproblems
AT dawsonkevinj agglomerativehierarchicalclusteringapproachtovisualisationinbayesianclusteringproblems
AT belkhirkhalid agglomerativehierarchicalclusteringapproachtovisualisationinbayesianclusteringproblems