Cargando…

Cluster Validity Index for Uncertain Data Based on a Probabilistic Distance Measure in Feature Space

Cluster validity indices (CVIs) for evaluating the result of the optimal number of clusters are critical measures in clustering problems. Most CVIs are designed for typical data-type objects called certain data objects. Certain data objects only have a singular value and include no uncertainty, so t...

Descripción completa

Detalles Bibliográficos
Autores principales: Ko, Changwan, Baek, Jaeseung, Tavakkol, Behnam, Jeong, Young-Seon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10099331/
https://www.ncbi.nlm.nih.gov/pubmed/37050769
http://dx.doi.org/10.3390/s23073708
_version_ 1785025028220256256
author Ko, Changwan
Baek, Jaeseung
Tavakkol, Behnam
Jeong, Young-Seon
author_facet Ko, Changwan
Baek, Jaeseung
Tavakkol, Behnam
Jeong, Young-Seon
author_sort Ko, Changwan
collection PubMed
description Cluster validity indices (CVIs) for evaluating the result of the optimal number of clusters are critical measures in clustering problems. Most CVIs are designed for typical data-type objects called certain data objects. Certain data objects only have a singular value and include no uncertainty, so they are assumed to be information-abundant in the real world. In this study, new CVIs for uncertain data, based on kernel probabilistic distance measures to calculate the distance between two distributions in feature space, are proposed for uncertain clusters with arbitrary shapes, sub-clusters, and noise in objects. By transforming original uncertain data into kernel spaces, the proposed CVI accurately measures the compactness and separability of a cluster for arbitrary cluster shapes and is robust to noise and outliers in a cluster. The proposed CVI was evaluated for diverse types of simulated and real-life uncertain objects, confirming that the proposed validity indexes in feature space outperform the pre-existing ones in the original space.
format Online
Article
Text
id pubmed-10099331
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-100993312023-04-14 Cluster Validity Index for Uncertain Data Based on a Probabilistic Distance Measure in Feature Space Ko, Changwan Baek, Jaeseung Tavakkol, Behnam Jeong, Young-Seon Sensors (Basel) Article Cluster validity indices (CVIs) for evaluating the result of the optimal number of clusters are critical measures in clustering problems. Most CVIs are designed for typical data-type objects called certain data objects. Certain data objects only have a singular value and include no uncertainty, so they are assumed to be information-abundant in the real world. In this study, new CVIs for uncertain data, based on kernel probabilistic distance measures to calculate the distance between two distributions in feature space, are proposed for uncertain clusters with arbitrary shapes, sub-clusters, and noise in objects. By transforming original uncertain data into kernel spaces, the proposed CVI accurately measures the compactness and separability of a cluster for arbitrary cluster shapes and is robust to noise and outliers in a cluster. The proposed CVI was evaluated for diverse types of simulated and real-life uncertain objects, confirming that the proposed validity indexes in feature space outperform the pre-existing ones in the original space. MDPI 2023-04-03 /pmc/articles/PMC10099331/ /pubmed/37050769 http://dx.doi.org/10.3390/s23073708 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ko, Changwan
Baek, Jaeseung
Tavakkol, Behnam
Jeong, Young-Seon
Cluster Validity Index for Uncertain Data Based on a Probabilistic Distance Measure in Feature Space
title Cluster Validity Index for Uncertain Data Based on a Probabilistic Distance Measure in Feature Space
title_full Cluster Validity Index for Uncertain Data Based on a Probabilistic Distance Measure in Feature Space
title_fullStr Cluster Validity Index for Uncertain Data Based on a Probabilistic Distance Measure in Feature Space
title_full_unstemmed Cluster Validity Index for Uncertain Data Based on a Probabilistic Distance Measure in Feature Space
title_short Cluster Validity Index for Uncertain Data Based on a Probabilistic Distance Measure in Feature Space
title_sort cluster validity index for uncertain data based on a probabilistic distance measure in feature space
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10099331/
https://www.ncbi.nlm.nih.gov/pubmed/37050769
http://dx.doi.org/10.3390/s23073708
work_keys_str_mv AT kochangwan clustervalidityindexforuncertaindatabasedonaprobabilisticdistancemeasureinfeaturespace
AT baekjaeseung clustervalidityindexforuncertaindatabasedonaprobabilisticdistancemeasureinfeaturespace
AT tavakkolbehnam clustervalidityindexforuncertaindatabasedonaprobabilisticdistancemeasureinfeaturespace
AT jeongyoungseon clustervalidityindexforuncertaindatabasedonaprobabilisticdistancemeasureinfeaturespace