Cargando…

Clustering compositional data using Dirichlet mixture model

A model-based clustering method for compositional data is explored in this article. Most methods for compositional data analysis require some kind of transformation. The proposed method builds a mixture model using Dirichlet distribution which works with the unit sum constraint. The mixture model us...

Descripción completa

Detalles Bibliográficos
Autores principales: Pal, Samyajoy, Heumann, Christian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9116644/
https://www.ncbi.nlm.nih.gov/pubmed/35584127
http://dx.doi.org/10.1371/journal.pone.0268438
_version_ 1784710153341239296
author Pal, Samyajoy
Heumann, Christian
author_facet Pal, Samyajoy
Heumann, Christian
author_sort Pal, Samyajoy
collection PubMed
description A model-based clustering method for compositional data is explored in this article. Most methods for compositional data analysis require some kind of transformation. The proposed method builds a mixture model using Dirichlet distribution which works with the unit sum constraint. The mixture model uses a hard EM algorithm with some modification to overcome the problem of fast convergence with empty clusters. This work includes a rigorous simulation study to evaluate the performance of the proposed method over varied dimensions, number of clusters, and overlap. The performance of the model is also compared with other popular clustering algorithms often used for compositional data analysis (e.g. KMeans, Gaussian mixture model (GMM) Gaussian Mixture Model with Hard EM (Hard GMM), partition around medoids (PAM), Clustering Large Applications based on Randomized Search (CLARANS), Density-Based Spatial Clustering of Applications with Noise (DBSCAN) etc.) for simulated data as well as two real data problems coming from the business and marketing domain and physical science domain, respectively. The study has shown promising results exploiting different distributional patterns of compositional data.
format Online
Article
Text
id pubmed-9116644
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-91166442022-05-19 Clustering compositional data using Dirichlet mixture model Pal, Samyajoy Heumann, Christian PLoS One Research Article A model-based clustering method for compositional data is explored in this article. Most methods for compositional data analysis require some kind of transformation. The proposed method builds a mixture model using Dirichlet distribution which works with the unit sum constraint. The mixture model uses a hard EM algorithm with some modification to overcome the problem of fast convergence with empty clusters. This work includes a rigorous simulation study to evaluate the performance of the proposed method over varied dimensions, number of clusters, and overlap. The performance of the model is also compared with other popular clustering algorithms often used for compositional data analysis (e.g. KMeans, Gaussian mixture model (GMM) Gaussian Mixture Model with Hard EM (Hard GMM), partition around medoids (PAM), Clustering Large Applications based on Randomized Search (CLARANS), Density-Based Spatial Clustering of Applications with Noise (DBSCAN) etc.) for simulated data as well as two real data problems coming from the business and marketing domain and physical science domain, respectively. The study has shown promising results exploiting different distributional patterns of compositional data. Public Library of Science 2022-05-18 /pmc/articles/PMC9116644/ /pubmed/35584127 http://dx.doi.org/10.1371/journal.pone.0268438 Text en © 2022 Pal, Heumann https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Pal, Samyajoy
Heumann, Christian
Clustering compositional data using Dirichlet mixture model
title Clustering compositional data using Dirichlet mixture model
title_full Clustering compositional data using Dirichlet mixture model
title_fullStr Clustering compositional data using Dirichlet mixture model
title_full_unstemmed Clustering compositional data using Dirichlet mixture model
title_short Clustering compositional data using Dirichlet mixture model
title_sort clustering compositional data using dirichlet mixture model
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9116644/
https://www.ncbi.nlm.nih.gov/pubmed/35584127
http://dx.doi.org/10.1371/journal.pone.0268438
work_keys_str_mv AT palsamyajoy clusteringcompositionaldatausingdirichletmixturemodel
AT heumannchristian clusteringcompositionaldatausingdirichletmixturemodel