Cargando…

A medoid-based deviation ratio index to determine the number of clusters in a dataset

Most existing methods of determining the number of groups apply to particular data types or are calculated based on the distance matrix for all object pairs. In this paper, we propose a medoid-based Deviation Ratio Index (DRI) to determine the number of clusters. The DRI is calculated based on the d...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kariyam, Abdurakhman, Effendie, Adhitya Ronnie
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2023
Materias:	Method Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10011427/ https://www.ncbi.nlm.nih.gov/pubmed/36926268 http://dx.doi.org/10.1016/j.mex.2023.102084

_version_	1784906390312058880
author	Kariyam Abdurakhman Effendie, Adhitya Ronnie
author_facet	Kariyam Abdurakhman Effendie, Adhitya Ronnie
author_sort	Kariyam
collection	PubMed
description	Most existing methods of determining the number of groups apply to particular data types or are calculated based on the distance matrix for all object pairs. In this paper, we propose a medoid-based Deviation Ratio Index (DRI) to determine the number of clusters. The DRI is calculated based on the distance matrix for each object to [Formula: see text] final medoids. These final medoids are produced by the block-based [Formula: see text]-medoids algorithm (BlockD-KM). We choose a specific transformation and a suitable distance for certain variables before executing the BlockD-KM. We illustrated the detailed stages of DRI on secondary data in the 2022 environmental index of Asia Pacific countries, so that they are easy to reproduce. We use eight real datasets, namely Breast Cancer, Heart Disease, Iris, Wine, Soybean, Ionosphere, Vote, and Credit Approval data, to validate the DRI method. We compare the DRI method with the Calinski-Harabaz (CH) and the Silhouette index. The experimental results show that the DRI is 100% correct in predicting the number of clusters. While the CH index correctly predicts 62.5% and the Silhouette index of 75%. We also generated three kinds of artificial data to evaluate the proposed method, and 76.7% of the experiments were predicted correctly. • The medoid-based deviation ratio index aids the researcher in determining the number of clusters; • The DRI method applicable to any medoids-based partitioning algorithm; • This method is suitable for all data types (categorical, numerical, and mixed).
format	Online Article Text
id	pubmed-10011427
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-100114272023-03-15 A medoid-based deviation ratio index to determine the number of clusters in a dataset Kariyam Abdurakhman Effendie, Adhitya Ronnie MethodsX Method Article Most existing methods of determining the number of groups apply to particular data types or are calculated based on the distance matrix for all object pairs. In this paper, we propose a medoid-based Deviation Ratio Index (DRI) to determine the number of clusters. The DRI is calculated based on the distance matrix for each object to [Formula: see text] final medoids. These final medoids are produced by the block-based [Formula: see text]-medoids algorithm (BlockD-KM). We choose a specific transformation and a suitable distance for certain variables before executing the BlockD-KM. We illustrated the detailed stages of DRI on secondary data in the 2022 environmental index of Asia Pacific countries, so that they are easy to reproduce. We use eight real datasets, namely Breast Cancer, Heart Disease, Iris, Wine, Soybean, Ionosphere, Vote, and Credit Approval data, to validate the DRI method. We compare the DRI method with the Calinski-Harabaz (CH) and the Silhouette index. The experimental results show that the DRI is 100% correct in predicting the number of clusters. While the CH index correctly predicts 62.5% and the Silhouette index of 75%. We also generated three kinds of artificial data to evaluate the proposed method, and 76.7% of the experiments were predicted correctly. • The medoid-based deviation ratio index aids the researcher in determining the number of clusters; • The DRI method applicable to any medoids-based partitioning algorithm; • This method is suitable for all data types (categorical, numerical, and mixed). Elsevier 2023-02-25 /pmc/articles/PMC10011427/ /pubmed/36926268 http://dx.doi.org/10.1016/j.mex.2023.102084 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Method Article Kariyam Abdurakhman Effendie, Adhitya Ronnie A medoid-based deviation ratio index to determine the number of clusters in a dataset
title	A medoid-based deviation ratio index to determine the number of clusters in a dataset
title_full	A medoid-based deviation ratio index to determine the number of clusters in a dataset
title_fullStr	A medoid-based deviation ratio index to determine the number of clusters in a dataset
title_full_unstemmed	A medoid-based deviation ratio index to determine the number of clusters in a dataset
title_short	A medoid-based deviation ratio index to determine the number of clusters in a dataset
title_sort	medoid-based deviation ratio index to determine the number of clusters in a dataset
topic	Method Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10011427/ https://www.ncbi.nlm.nih.gov/pubmed/36926268 http://dx.doi.org/10.1016/j.mex.2023.102084
work_keys_str_mv	AT kariyam amedoidbaseddeviationratioindextodeterminethenumberofclustersinadataset AT abdurakhman amedoidbaseddeviationratioindextodeterminethenumberofclustersinadataset AT effendieadhityaronnie amedoidbaseddeviationratioindextodeterminethenumberofclustersinadataset AT kariyam medoidbaseddeviationratioindextodeterminethenumberofclustersinadataset AT abdurakhman medoidbaseddeviationratioindextodeterminethenumberofclustersinadataset AT effendieadhityaronnie medoidbaseddeviationratioindextodeterminethenumberofclustersinadataset

A medoid-based deviation ratio index to determine the number of clusters in a dataset

Ejemplares similares