Cargando…

Finding the mean in a partition distribution

BACKGROUND: Bayesian clustering algorithms, in particular those utilizing Dirichlet Processes (DP), return a sample of the posterior distribution of partitions of a set. However, in many applied cases a single clustering solution is desired, requiring a ’best’ partition to be created from the poster...

Descripción completa

Detalles Bibliográficos
Autores principales:	Glassen, Thomas J., Oertzen, Timo von, Konovalov, Dmitry A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6186144/ https://www.ncbi.nlm.nih.gov/pubmed/30314432 http://dx.doi.org/10.1186/s12859-018-2359-z

_version_	1783362821091753984
author	Glassen, Thomas J. Oertzen, Timo von Konovalov, Dmitry A.
author_facet	Glassen, Thomas J. Oertzen, Timo von Konovalov, Dmitry A.
author_sort	Glassen, Thomas J.
collection	PubMed
description	BACKGROUND: Bayesian clustering algorithms, in particular those utilizing Dirichlet Processes (DP), return a sample of the posterior distribution of partitions of a set. However, in many applied cases a single clustering solution is desired, requiring a ’best’ partition to be created from the posterior sample. It is an open research question which solution should be recommended in which situation. However, one such candidate is the sample mean, defined as the clustering with minimal squared distance to all partitions in the posterior sample, weighted by their probability. In this article, we review an algorithm that approximates this sample mean by using the Hungarian Method to compute the distance between partitions. This algorithm leaves room for further processing acceleration. RESULTS: We highlight a faster variant of the partition distance reduction that leads to a runtime complexity that is up to two orders of magnitude lower than the standard variant. We suggest two further improvements: The first is deterministic and based on an adapted dynamical version of the Hungarian Algorithm, which achieves another runtime decrease of at least one order of magnitude. The second improvement is theoretical and uses Monte Carlo techniques and the dynamic matrix inverse. Thereby we further reduce the runtime complexity by nearly the square root of one order of magnitude. CONCLUSIONS: Overall this results in a new mean partition algorithm with an acceleration factor reaching beyond that of the present algorithm by the size of the partitions. The new algorithm is implemented in Java and available on GitHub (Glassen, Mean Partition, 2018).
format	Online Article Text
id	pubmed-6186144
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-61861442018-10-19 Finding the mean in a partition distribution Glassen, Thomas J. Oertzen, Timo von Konovalov, Dmitry A. BMC Bioinformatics Research Article BACKGROUND: Bayesian clustering algorithms, in particular those utilizing Dirichlet Processes (DP), return a sample of the posterior distribution of partitions of a set. However, in many applied cases a single clustering solution is desired, requiring a ’best’ partition to be created from the posterior sample. It is an open research question which solution should be recommended in which situation. However, one such candidate is the sample mean, defined as the clustering with minimal squared distance to all partitions in the posterior sample, weighted by their probability. In this article, we review an algorithm that approximates this sample mean by using the Hungarian Method to compute the distance between partitions. This algorithm leaves room for further processing acceleration. RESULTS: We highlight a faster variant of the partition distance reduction that leads to a runtime complexity that is up to two orders of magnitude lower than the standard variant. We suggest two further improvements: The first is deterministic and based on an adapted dynamical version of the Hungarian Algorithm, which achieves another runtime decrease of at least one order of magnitude. The second improvement is theoretical and uses Monte Carlo techniques and the dynamic matrix inverse. Thereby we further reduce the runtime complexity by nearly the square root of one order of magnitude. CONCLUSIONS: Overall this results in a new mean partition algorithm with an acceleration factor reaching beyond that of the present algorithm by the size of the partitions. The new algorithm is implemented in Java and available on GitHub (Glassen, Mean Partition, 2018). BioMed Central 2018-10-12 /pmc/articles/PMC6186144/ /pubmed/30314432 http://dx.doi.org/10.1186/s12859-018-2359-z Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Glassen, Thomas J. Oertzen, Timo von Konovalov, Dmitry A. Finding the mean in a partition distribution
title	Finding the mean in a partition distribution
title_full	Finding the mean in a partition distribution
title_fullStr	Finding the mean in a partition distribution
title_full_unstemmed	Finding the mean in a partition distribution
title_short	Finding the mean in a partition distribution
title_sort	finding the mean in a partition distribution
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6186144/ https://www.ncbi.nlm.nih.gov/pubmed/30314432 http://dx.doi.org/10.1186/s12859-018-2359-z
work_keys_str_mv	AT glassenthomasj findingthemeaninapartitiondistribution AT oertzentimovon findingthemeaninapartitiondistribution AT konovalovdmitrya findingthemeaninapartitiondistribution

Finding the mean in a partition distribution

Ejemplares similares