Cargando…

How to Use Model-Based Cluster Analysis Efficiently in Person-Oriented Research

Model-based cluster analysis (MBCA) was created to automatize the often subjective model-selection procedure of traditional explorative clustering methods. It is a type of finite mixture modelling, assuming that the data come from a mixture of different subpopulations following given distributions,...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gergely, Bence, Vargha, András
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Scandinavian Society for Person-Oriented Research 2021
Materias:	Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8411881/ https://www.ncbi.nlm.nih.gov/pubmed/34548917 http://dx.doi.org/10.17505/jpor.2021.23449

_version_	1783747364448632832
author	Gergely, Bence Vargha, András
author_facet	Gergely, Bence Vargha, András
author_sort	Gergely, Bence
collection	PubMed
description	Model-based cluster analysis (MBCA) was created to automatize the often subjective model-selection procedure of traditional explorative clustering methods. It is a type of finite mixture modelling, assuming that the data come from a mixture of different subpopulations following given distributions, typically multivariate normal. In that case cluster analysis is the exploration of the underlying mixture structure. In MBCA finding the possible number of clusters and the best clustering model is a statistical model-selection problem, where the models with differing number and type of component distributions are compared. For fitting a certain model MBCA uses a likelihood based Bayesian Information Criterion (BIC) to evaluate its appropriateness and the model with the highest BIC value is accepted as the final solution. The aim of the present study is to investigate the adequacy of automatic model selection in MBCA using BIC, and suggested alternative methods, like the Integrated Completed Likelihood Criterion (ICL), or Baudry’s method. An additional aim is to refine these procedures by using so called quality coefficients (QCs), borrowed from methodological advances within the field of exploratory cluster analysis, to help in the choice of an appropriate cluster structure (CLS), and also to compare the efficiency of MBCA in identifying a theoretical CLS with those of various other clustering methods. The analyses are restricted to studying the performance of various procedures of the type described above for two classification situations, typical in person-oriented studies: (1) an example data set characterized by a perfect theoretical CLS with seven types (seven completely homogeneous clusters) was used to generate three data sets with varying degrees of measurement error added to the original values, and (2) three additional data sets based on another perfect theoretical CLS with four types. It was found that the automatic decision rarely led to an optimal solution. However, dropping solutions with irregular BIC curves, and using different QCs as an aid in choosing between different solutions generated by MBCA and by fusing close clusters, optimal solutions were achieved for the two classification situations studied. With this refined procedure the revealed cluster solutions of MBCA often proved to be at least as good as those of different hierarchical and k-center clustering methods. MBCA was definitely superior in identifying four-type CLS models. In identifying seven-type CLS models MBCA performed at a similar level as the best of other clustering methods (such as k-means) only when the reliability level of the input variables was high or moderate, otherwise it was slightly less efficient.
format	Online Article Text
id	pubmed-8411881
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Scandinavian Society for Person-Oriented Research
record_format	MEDLINE/PubMed
spelling	pubmed-84118812021-09-20 How to Use Model-Based Cluster Analysis Efficiently in Person-Oriented Research Gergely, Bence Vargha, András J Pers Oriented Res Articles Model-based cluster analysis (MBCA) was created to automatize the often subjective model-selection procedure of traditional explorative clustering methods. It is a type of finite mixture modelling, assuming that the data come from a mixture of different subpopulations following given distributions, typically multivariate normal. In that case cluster analysis is the exploration of the underlying mixture structure. In MBCA finding the possible number of clusters and the best clustering model is a statistical model-selection problem, where the models with differing number and type of component distributions are compared. For fitting a certain model MBCA uses a likelihood based Bayesian Information Criterion (BIC) to evaluate its appropriateness and the model with the highest BIC value is accepted as the final solution. The aim of the present study is to investigate the adequacy of automatic model selection in MBCA using BIC, and suggested alternative methods, like the Integrated Completed Likelihood Criterion (ICL), or Baudry’s method. An additional aim is to refine these procedures by using so called quality coefficients (QCs), borrowed from methodological advances within the field of exploratory cluster analysis, to help in the choice of an appropriate cluster structure (CLS), and also to compare the efficiency of MBCA in identifying a theoretical CLS with those of various other clustering methods. The analyses are restricted to studying the performance of various procedures of the type described above for two classification situations, typical in person-oriented studies: (1) an example data set characterized by a perfect theoretical CLS with seven types (seven completely homogeneous clusters) was used to generate three data sets with varying degrees of measurement error added to the original values, and (2) three additional data sets based on another perfect theoretical CLS with four types. It was found that the automatic decision rarely led to an optimal solution. However, dropping solutions with irregular BIC curves, and using different QCs as an aid in choosing between different solutions generated by MBCA and by fusing close clusters, optimal solutions were achieved for the two classification situations studied. With this refined procedure the revealed cluster solutions of MBCA often proved to be at least as good as those of different hierarchical and k-center clustering methods. MBCA was definitely superior in identifying four-type CLS models. In identifying seven-type CLS models MBCA performed at a similar level as the best of other clustering methods (such as k-means) only when the reliability level of the input variables was high or moderate, otherwise it was slightly less efficient. Scandinavian Society for Person-Oriented Research 2021-08-26 /pmc/articles/PMC8411881/ /pubmed/34548917 http://dx.doi.org/10.17505/jpor.2021.23449 Text en © Person-Oriented Research https://person-research.org/journal/Authors of articles published in Journal for Person-Oriented Research retain the copyright of their articles and are free to reproduce and disseminate their work.
spellingShingle	Articles Gergely, Bence Vargha, András How to Use Model-Based Cluster Analysis Efficiently in Person-Oriented Research
title	How to Use Model-Based Cluster Analysis Efficiently in Person-Oriented Research
title_full	How to Use Model-Based Cluster Analysis Efficiently in Person-Oriented Research
title_fullStr	How to Use Model-Based Cluster Analysis Efficiently in Person-Oriented Research
title_full_unstemmed	How to Use Model-Based Cluster Analysis Efficiently in Person-Oriented Research
title_short	How to Use Model-Based Cluster Analysis Efficiently in Person-Oriented Research
title_sort	how to use model-based cluster analysis efficiently in person-oriented research
topic	Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8411881/ https://www.ncbi.nlm.nih.gov/pubmed/34548917 http://dx.doi.org/10.17505/jpor.2021.23449
work_keys_str_mv	AT gergelybence howtousemodelbasedclusteranalysisefficientlyinpersonorientedresearch AT varghaandras howtousemodelbasedclusteranalysisefficientlyinpersonorientedresearch

How to Use Model-Based Cluster Analysis Efficiently in Person-Oriented Research

Ejemplares similares