Cargando…
Finding the mean in a partition distribution
BACKGROUND: Bayesian clustering algorithms, in particular those utilizing Dirichlet Processes (DP), return a sample of the posterior distribution of partitions of a set. However, in many applied cases a single clustering solution is desired, requiring a ’best’ partition to be created from the poster...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6186144/ https://www.ncbi.nlm.nih.gov/pubmed/30314432 http://dx.doi.org/10.1186/s12859-018-2359-z |
_version_ | 1783362821091753984 |
---|---|
author | Glassen, Thomas J. Oertzen, Timo von Konovalov, Dmitry A. |
author_facet | Glassen, Thomas J. Oertzen, Timo von Konovalov, Dmitry A. |
author_sort | Glassen, Thomas J. |
collection | PubMed |
description | BACKGROUND: Bayesian clustering algorithms, in particular those utilizing Dirichlet Processes (DP), return a sample of the posterior distribution of partitions of a set. However, in many applied cases a single clustering solution is desired, requiring a ’best’ partition to be created from the posterior sample. It is an open research question which solution should be recommended in which situation. However, one such candidate is the sample mean, defined as the clustering with minimal squared distance to all partitions in the posterior sample, weighted by their probability. In this article, we review an algorithm that approximates this sample mean by using the Hungarian Method to compute the distance between partitions. This algorithm leaves room for further processing acceleration. RESULTS: We highlight a faster variant of the partition distance reduction that leads to a runtime complexity that is up to two orders of magnitude lower than the standard variant. We suggest two further improvements: The first is deterministic and based on an adapted dynamical version of the Hungarian Algorithm, which achieves another runtime decrease of at least one order of magnitude. The second improvement is theoretical and uses Monte Carlo techniques and the dynamic matrix inverse. Thereby we further reduce the runtime complexity by nearly the square root of one order of magnitude. CONCLUSIONS: Overall this results in a new mean partition algorithm with an acceleration factor reaching beyond that of the present algorithm by the size of the partitions. The new algorithm is implemented in Java and available on GitHub (Glassen, Mean Partition, 2018). |
format | Online Article Text |
id | pubmed-6186144 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-61861442018-10-19 Finding the mean in a partition distribution Glassen, Thomas J. Oertzen, Timo von Konovalov, Dmitry A. BMC Bioinformatics Research Article BACKGROUND: Bayesian clustering algorithms, in particular those utilizing Dirichlet Processes (DP), return a sample of the posterior distribution of partitions of a set. However, in many applied cases a single clustering solution is desired, requiring a ’best’ partition to be created from the posterior sample. It is an open research question which solution should be recommended in which situation. However, one such candidate is the sample mean, defined as the clustering with minimal squared distance to all partitions in the posterior sample, weighted by their probability. In this article, we review an algorithm that approximates this sample mean by using the Hungarian Method to compute the distance between partitions. This algorithm leaves room for further processing acceleration. RESULTS: We highlight a faster variant of the partition distance reduction that leads to a runtime complexity that is up to two orders of magnitude lower than the standard variant. We suggest two further improvements: The first is deterministic and based on an adapted dynamical version of the Hungarian Algorithm, which achieves another runtime decrease of at least one order of magnitude. The second improvement is theoretical and uses Monte Carlo techniques and the dynamic matrix inverse. Thereby we further reduce the runtime complexity by nearly the square root of one order of magnitude. CONCLUSIONS: Overall this results in a new mean partition algorithm with an acceleration factor reaching beyond that of the present algorithm by the size of the partitions. The new algorithm is implemented in Java and available on GitHub (Glassen, Mean Partition, 2018). BioMed Central 2018-10-12 /pmc/articles/PMC6186144/ /pubmed/30314432 http://dx.doi.org/10.1186/s12859-018-2359-z Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Glassen, Thomas J. Oertzen, Timo von Konovalov, Dmitry A. Finding the mean in a partition distribution |
title | Finding the mean in a partition distribution |
title_full | Finding the mean in a partition distribution |
title_fullStr | Finding the mean in a partition distribution |
title_full_unstemmed | Finding the mean in a partition distribution |
title_short | Finding the mean in a partition distribution |
title_sort | finding the mean in a partition distribution |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6186144/ https://www.ncbi.nlm.nih.gov/pubmed/30314432 http://dx.doi.org/10.1186/s12859-018-2359-z |
work_keys_str_mv | AT glassenthomasj findingthemeaninapartitiondistribution AT oertzentimovon findingthemeaninapartitiondistribution AT konovalovdmitrya findingthemeaninapartitiondistribution |