Cargando…

Model-based clustering based on sparse finite Gaussian mixtures

In the framework of Bayesian model-based clustering based on a finite mixture of Gaussian distributions, we present a joint approach to estimate the number of mixture components and identify cluster-relevant variables simultaneously as well as to obtain an identified model. Our approach consists in...

Descripción completa

Detalles Bibliográficos
Autores principales: Malsiner-Walli, Gertraud, Frühwirth-Schnatter, Sylvia, Grün, Bettina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4750551/
https://www.ncbi.nlm.nih.gov/pubmed/26900266
http://dx.doi.org/10.1007/s11222-014-9500-2
_version_ 1782415450837090304
author Malsiner-Walli, Gertraud
Frühwirth-Schnatter, Sylvia
Grün, Bettina
author_facet Malsiner-Walli, Gertraud
Frühwirth-Schnatter, Sylvia
Grün, Bettina
author_sort Malsiner-Walli, Gertraud
collection PubMed
description In the framework of Bayesian model-based clustering based on a finite mixture of Gaussian distributions, we present a joint approach to estimate the number of mixture components and identify cluster-relevant variables simultaneously as well as to obtain an identified model. Our approach consists in specifying sparse hierarchical priors on the mixture weights and component means. In a deliberately overfitting mixture model the sparse prior on the weights empties superfluous components during MCMC. A straightforward estimator for the true number of components is given by the most frequent number of non-empty components visited during MCMC sampling. Specifying a shrinkage prior, namely the normal gamma prior, on the component means leads to improved parameter estimates as well as identification of cluster-relevant variables. After estimating the mixture model using MCMC methods based on data augmentation and Gibbs sampling, an identified model is obtained by relabeling the MCMC output in the point process representation of the draws. This is performed using [Formula: see text] -centroids cluster analysis based on the Mahalanobis distance. We evaluate our proposed strategy in a simulation setup with artificial data and by applying it to benchmark data sets.
format Online
Article
Text
id pubmed-4750551
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-47505512016-02-19 Model-based clustering based on sparse finite Gaussian mixtures Malsiner-Walli, Gertraud Frühwirth-Schnatter, Sylvia Grün, Bettina Stat Comput Article In the framework of Bayesian model-based clustering based on a finite mixture of Gaussian distributions, we present a joint approach to estimate the number of mixture components and identify cluster-relevant variables simultaneously as well as to obtain an identified model. Our approach consists in specifying sparse hierarchical priors on the mixture weights and component means. In a deliberately overfitting mixture model the sparse prior on the weights empties superfluous components during MCMC. A straightforward estimator for the true number of components is given by the most frequent number of non-empty components visited during MCMC sampling. Specifying a shrinkage prior, namely the normal gamma prior, on the component means leads to improved parameter estimates as well as identification of cluster-relevant variables. After estimating the mixture model using MCMC methods based on data augmentation and Gibbs sampling, an identified model is obtained by relabeling the MCMC output in the point process representation of the draws. This is performed using [Formula: see text] -centroids cluster analysis based on the Mahalanobis distance. We evaluate our proposed strategy in a simulation setup with artificial data and by applying it to benchmark data sets. Springer US 2014-08-26 2016 /pmc/articles/PMC4750551/ /pubmed/26900266 http://dx.doi.org/10.1007/s11222-014-9500-2 Text en © The Author(s) 2014 https://creativecommons.org/licenses/by/4.0/ Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
spellingShingle Article
Malsiner-Walli, Gertraud
Frühwirth-Schnatter, Sylvia
Grün, Bettina
Model-based clustering based on sparse finite Gaussian mixtures
title Model-based clustering based on sparse finite Gaussian mixtures
title_full Model-based clustering based on sparse finite Gaussian mixtures
title_fullStr Model-based clustering based on sparse finite Gaussian mixtures
title_full_unstemmed Model-based clustering based on sparse finite Gaussian mixtures
title_short Model-based clustering based on sparse finite Gaussian mixtures
title_sort model-based clustering based on sparse finite gaussian mixtures
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4750551/
https://www.ncbi.nlm.nih.gov/pubmed/26900266
http://dx.doi.org/10.1007/s11222-014-9500-2
work_keys_str_mv AT malsinerwalligertraud modelbasedclusteringbasedonsparsefinitegaussianmixtures
AT fruhwirthschnattersylvia modelbasedclusteringbasedonsparsefinitegaussianmixtures
AT grunbettina modelbasedclusteringbasedonsparsefinitegaussianmixtures