Cargando…

An Integrated Approach for Making Inference on the Number of Clusters in a Mixture Model

This paper presents an integrated approach for the estimation of the parameters of a mixture model in the context of data clustering. The method is designed to estimate the unknown number of clusters from observed data. For this, we marginalize out the weights for getting allocation probabilities th...

Descripción completa

Detalles Bibliográficos
Autores principales: Saraiva, Erlandson Ferreira, Suzuki,  Adriano Kamimura, Milan, Luis Aparecido, Pereira, Carlos Alberto de Bragança
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514367/
http://dx.doi.org/10.3390/e21111063
_version_ 1783586571740512256
author Saraiva, Erlandson Ferreira
Suzuki,  Adriano Kamimura
Milan, Luis Aparecido
Pereira, Carlos Alberto de Bragança
author_facet Saraiva, Erlandson Ferreira
Suzuki,  Adriano Kamimura
Milan, Luis Aparecido
Pereira, Carlos Alberto de Bragança
author_sort Saraiva, Erlandson Ferreira
collection PubMed
description This paper presents an integrated approach for the estimation of the parameters of a mixture model in the context of data clustering. The method is designed to estimate the unknown number of clusters from observed data. For this, we marginalize out the weights for getting allocation probabilities that depend on the number of clusters but not on the number of components of the mixture model. As an alternative to the stochastic expectation maximization (SEM) algorithm, we propose the integrated stochastic expectation maximization (ISEM) algorithm, which in contrast to SEM, does not need the specification, a priori, of the number of components of the mixture. Using this algorithm, one estimates the parameters associated with the clusters, with at least two observations, via local maximization of the likelihood function. In addition, at each iteration of the algorithm, there exists a positive probability of a new cluster being created by a single observation. Using simulated datasets, we compare the performance of the ISEM algorithm against both SEM and reversible jump (RJ) algorithms. The obtained results show that ISEM outperforms SEM and RJ algorithms. We also provide the performance of the three algorithms in two real datasets.
format Online
Article
Text
id pubmed-7514367
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75143672020-11-09 An Integrated Approach for Making Inference on the Number of Clusters in a Mixture Model Saraiva, Erlandson Ferreira Suzuki,  Adriano Kamimura Milan, Luis Aparecido Pereira, Carlos Alberto de Bragança Entropy (Basel) Article This paper presents an integrated approach for the estimation of the parameters of a mixture model in the context of data clustering. The method is designed to estimate the unknown number of clusters from observed data. For this, we marginalize out the weights for getting allocation probabilities that depend on the number of clusters but not on the number of components of the mixture model. As an alternative to the stochastic expectation maximization (SEM) algorithm, we propose the integrated stochastic expectation maximization (ISEM) algorithm, which in contrast to SEM, does not need the specification, a priori, of the number of components of the mixture. Using this algorithm, one estimates the parameters associated with the clusters, with at least two observations, via local maximization of the likelihood function. In addition, at each iteration of the algorithm, there exists a positive probability of a new cluster being created by a single observation. Using simulated datasets, we compare the performance of the ISEM algorithm against both SEM and reversible jump (RJ) algorithms. The obtained results show that ISEM outperforms SEM and RJ algorithms. We also provide the performance of the three algorithms in two real datasets. MDPI 2019-10-30 /pmc/articles/PMC7514367/ http://dx.doi.org/10.3390/e21111063 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Saraiva, Erlandson Ferreira
Suzuki,  Adriano Kamimura
Milan, Luis Aparecido
Pereira, Carlos Alberto de Bragança
An Integrated Approach for Making Inference on the Number of Clusters in a Mixture Model
title An Integrated Approach for Making Inference on the Number of Clusters in a Mixture Model
title_full An Integrated Approach for Making Inference on the Number of Clusters in a Mixture Model
title_fullStr An Integrated Approach for Making Inference on the Number of Clusters in a Mixture Model
title_full_unstemmed An Integrated Approach for Making Inference on the Number of Clusters in a Mixture Model
title_short An Integrated Approach for Making Inference on the Number of Clusters in a Mixture Model
title_sort integrated approach for making inference on the number of clusters in a mixture model
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514367/
http://dx.doi.org/10.3390/e21111063
work_keys_str_mv AT saraivaerlandsonferreira anintegratedapproachformakinginferenceonthenumberofclustersinamixturemodel
AT suzukiadrianokamimura anintegratedapproachformakinginferenceonthenumberofclustersinamixturemodel
AT milanluisaparecido anintegratedapproachformakinginferenceonthenumberofclustersinamixturemodel
AT pereiracarlosalbertodebraganca anintegratedapproachformakinginferenceonthenumberofclustersinamixturemodel
AT saraivaerlandsonferreira integratedapproachformakinginferenceonthenumberofclustersinamixturemodel
AT suzukiadrianokamimura integratedapproachformakinginferenceonthenumberofclustersinamixturemodel
AT milanluisaparecido integratedapproachformakinginferenceonthenumberofclustersinamixturemodel
AT pereiracarlosalbertodebraganca integratedapproachformakinginferenceonthenumberofclustersinamixturemodel