Cargando…

Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations

We consider the question of Markov chain Monte Carlo sampling from a general stick-breaking Dirichlet process mixture model, with concentration parameter [Formula: see text] . This paper introduces a Gibbs sampling algorithm that combines the slice sampling approach of Walker (Communications in Stat...

Descripción completa

Detalles Bibliográficos
Autores principales: Hastie, David I., Liverani, Silvia, Richardson, Sylvia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4550296/
https://www.ncbi.nlm.nih.gov/pubmed/26321800
http://dx.doi.org/10.1007/s11222-014-9471-3
_version_ 1782387434727669760
author Hastie, David I.
Liverani, Silvia
Richardson, Sylvia
author_facet Hastie, David I.
Liverani, Silvia
Richardson, Sylvia
author_sort Hastie, David I.
collection PubMed
description We consider the question of Markov chain Monte Carlo sampling from a general stick-breaking Dirichlet process mixture model, with concentration parameter [Formula: see text] . This paper introduces a Gibbs sampling algorithm that combines the slice sampling approach of Walker (Communications in Statistics - Simulation and Computation 36:45–54, 2007) and the retrospective sampling approach of Papaspiliopoulos and Roberts (Biometrika 95(1):169–186, 2008). Our general algorithm is implemented as efficient open source C++ software, available as an R package, and is based on a blocking strategy similar to that suggested by Papaspiliopoulos (A note on posterior sampling from Dirichlet mixture models, 2008) and implemented by Yau et al. (Journal of the Royal Statistical Society, Series B (Statistical Methodology) 73:37–57, 2011). We discuss the difficulties of achieving good mixing in MCMC samplers of this nature in large data sets and investigate sensitivity to initialisation. We additionally consider the challenges when an additional layer of hierarchy is added such that joint inference is to be made on [Formula: see text] . We introduce a new label-switching move and compute the marginal partition posterior to help to surmount these difficulties. Our work is illustrated using a profile regression (Molitor et al. Biostatistics 11(3):484–498, 2010) application, where we demonstrate good mixing behaviour for both synthetic and real examples. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11222-014-9471-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4550296
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-45502962015-08-28 Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations Hastie, David I. Liverani, Silvia Richardson, Sylvia Stat Comput Article We consider the question of Markov chain Monte Carlo sampling from a general stick-breaking Dirichlet process mixture model, with concentration parameter [Formula: see text] . This paper introduces a Gibbs sampling algorithm that combines the slice sampling approach of Walker (Communications in Statistics - Simulation and Computation 36:45–54, 2007) and the retrospective sampling approach of Papaspiliopoulos and Roberts (Biometrika 95(1):169–186, 2008). Our general algorithm is implemented as efficient open source C++ software, available as an R package, and is based on a blocking strategy similar to that suggested by Papaspiliopoulos (A note on posterior sampling from Dirichlet mixture models, 2008) and implemented by Yau et al. (Journal of the Royal Statistical Society, Series B (Statistical Methodology) 73:37–57, 2011). We discuss the difficulties of achieving good mixing in MCMC samplers of this nature in large data sets and investigate sensitivity to initialisation. We additionally consider the challenges when an additional layer of hierarchy is added such that joint inference is to be made on [Formula: see text] . We introduce a new label-switching move and compute the marginal partition posterior to help to surmount these difficulties. Our work is illustrated using a profile regression (Molitor et al. Biostatistics 11(3):484–498, 2010) application, where we demonstrate good mixing behaviour for both synthetic and real examples. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11222-014-9471-3) contains supplementary material, which is available to authorized users. Springer US 2014-05-03 2015 /pmc/articles/PMC4550296/ /pubmed/26321800 http://dx.doi.org/10.1007/s11222-014-9471-3 Text en © The Author(s) 2014 https://creativecommons.org/licenses/by/4.0/ Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
spellingShingle Article
Hastie, David I.
Liverani, Silvia
Richardson, Sylvia
Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations
title Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations
title_full Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations
title_fullStr Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations
title_full_unstemmed Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations
title_short Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations
title_sort sampling from dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4550296/
https://www.ncbi.nlm.nih.gov/pubmed/26321800
http://dx.doi.org/10.1007/s11222-014-9471-3
work_keys_str_mv AT hastiedavidi samplingfromdirichletprocessmixturemodelswithunknownconcentrationparametermixingissuesinlargedataimplementations
AT liveranisilvia samplingfromdirichletprocessmixturemodelswithunknownconcentrationparametermixingissuesinlargedataimplementations
AT richardsonsylvia samplingfromdirichletprocessmixturemodelswithunknownconcentrationparametermixingissuesinlargedataimplementations