Cargando…

Finding the mean in a partition distribution

BACKGROUND: Bayesian clustering algorithms, in particular those utilizing Dirichlet Processes (DP), return a sample of the posterior distribution of partitions of a set. However, in many applied cases a single clustering solution is desired, requiring a ’best’ partition to be created from the poster...

Descripción completa

Detalles Bibliográficos
Autores principales: Glassen, Thomas J., Oertzen, Timo von, Konovalov, Dmitry A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6186144/
https://www.ncbi.nlm.nih.gov/pubmed/30314432
http://dx.doi.org/10.1186/s12859-018-2359-z
_version_ 1783362821091753984
author Glassen, Thomas J.
Oertzen, Timo von
Konovalov, Dmitry A.
author_facet Glassen, Thomas J.
Oertzen, Timo von
Konovalov, Dmitry A.
author_sort Glassen, Thomas J.
collection PubMed
description BACKGROUND: Bayesian clustering algorithms, in particular those utilizing Dirichlet Processes (DP), return a sample of the posterior distribution of partitions of a set. However, in many applied cases a single clustering solution is desired, requiring a ’best’ partition to be created from the posterior sample. It is an open research question which solution should be recommended in which situation. However, one such candidate is the sample mean, defined as the clustering with minimal squared distance to all partitions in the posterior sample, weighted by their probability. In this article, we review an algorithm that approximates this sample mean by using the Hungarian Method to compute the distance between partitions. This algorithm leaves room for further processing acceleration. RESULTS: We highlight a faster variant of the partition distance reduction that leads to a runtime complexity that is up to two orders of magnitude lower than the standard variant. We suggest two further improvements: The first is deterministic and based on an adapted dynamical version of the Hungarian Algorithm, which achieves another runtime decrease of at least one order of magnitude. The second improvement is theoretical and uses Monte Carlo techniques and the dynamic matrix inverse. Thereby we further reduce the runtime complexity by nearly the square root of one order of magnitude. CONCLUSIONS: Overall this results in a new mean partition algorithm with an acceleration factor reaching beyond that of the present algorithm by the size of the partitions. The new algorithm is implemented in Java and available on GitHub (Glassen, Mean Partition, 2018).
format Online
Article
Text
id pubmed-6186144
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61861442018-10-19 Finding the mean in a partition distribution Glassen, Thomas J. Oertzen, Timo von Konovalov, Dmitry A. BMC Bioinformatics Research Article BACKGROUND: Bayesian clustering algorithms, in particular those utilizing Dirichlet Processes (DP), return a sample of the posterior distribution of partitions of a set. However, in many applied cases a single clustering solution is desired, requiring a ’best’ partition to be created from the posterior sample. It is an open research question which solution should be recommended in which situation. However, one such candidate is the sample mean, defined as the clustering with minimal squared distance to all partitions in the posterior sample, weighted by their probability. In this article, we review an algorithm that approximates this sample mean by using the Hungarian Method to compute the distance between partitions. This algorithm leaves room for further processing acceleration. RESULTS: We highlight a faster variant of the partition distance reduction that leads to a runtime complexity that is up to two orders of magnitude lower than the standard variant. We suggest two further improvements: The first is deterministic and based on an adapted dynamical version of the Hungarian Algorithm, which achieves another runtime decrease of at least one order of magnitude. The second improvement is theoretical and uses Monte Carlo techniques and the dynamic matrix inverse. Thereby we further reduce the runtime complexity by nearly the square root of one order of magnitude. CONCLUSIONS: Overall this results in a new mean partition algorithm with an acceleration factor reaching beyond that of the present algorithm by the size of the partitions. The new algorithm is implemented in Java and available on GitHub (Glassen, Mean Partition, 2018). BioMed Central 2018-10-12 /pmc/articles/PMC6186144/ /pubmed/30314432 http://dx.doi.org/10.1186/s12859-018-2359-z Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Glassen, Thomas J.
Oertzen, Timo von
Konovalov, Dmitry A.
Finding the mean in a partition distribution
title Finding the mean in a partition distribution
title_full Finding the mean in a partition distribution
title_fullStr Finding the mean in a partition distribution
title_full_unstemmed Finding the mean in a partition distribution
title_short Finding the mean in a partition distribution
title_sort finding the mean in a partition distribution
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6186144/
https://www.ncbi.nlm.nih.gov/pubmed/30314432
http://dx.doi.org/10.1186/s12859-018-2359-z
work_keys_str_mv AT glassenthomasj findingthemeaninapartitiondistribution
AT oertzentimovon findingthemeaninapartitiondistribution
AT konovalovdmitrya findingthemeaninapartitiondistribution