Cargando…
Cluster Validity Index for Uncertain Data Based on a Probabilistic Distance Measure in Feature Space
Cluster validity indices (CVIs) for evaluating the result of the optimal number of clusters are critical measures in clustering problems. Most CVIs are designed for typical data-type objects called certain data objects. Certain data objects only have a singular value and include no uncertainty, so t...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10099331/ https://www.ncbi.nlm.nih.gov/pubmed/37050769 http://dx.doi.org/10.3390/s23073708 |
_version_ | 1785025028220256256 |
---|---|
author | Ko, Changwan Baek, Jaeseung Tavakkol, Behnam Jeong, Young-Seon |
author_facet | Ko, Changwan Baek, Jaeseung Tavakkol, Behnam Jeong, Young-Seon |
author_sort | Ko, Changwan |
collection | PubMed |
description | Cluster validity indices (CVIs) for evaluating the result of the optimal number of clusters are critical measures in clustering problems. Most CVIs are designed for typical data-type objects called certain data objects. Certain data objects only have a singular value and include no uncertainty, so they are assumed to be information-abundant in the real world. In this study, new CVIs for uncertain data, based on kernel probabilistic distance measures to calculate the distance between two distributions in feature space, are proposed for uncertain clusters with arbitrary shapes, sub-clusters, and noise in objects. By transforming original uncertain data into kernel spaces, the proposed CVI accurately measures the compactness and separability of a cluster for arbitrary cluster shapes and is robust to noise and outliers in a cluster. The proposed CVI was evaluated for diverse types of simulated and real-life uncertain objects, confirming that the proposed validity indexes in feature space outperform the pre-existing ones in the original space. |
format | Online Article Text |
id | pubmed-10099331 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-100993312023-04-14 Cluster Validity Index for Uncertain Data Based on a Probabilistic Distance Measure in Feature Space Ko, Changwan Baek, Jaeseung Tavakkol, Behnam Jeong, Young-Seon Sensors (Basel) Article Cluster validity indices (CVIs) for evaluating the result of the optimal number of clusters are critical measures in clustering problems. Most CVIs are designed for typical data-type objects called certain data objects. Certain data objects only have a singular value and include no uncertainty, so they are assumed to be information-abundant in the real world. In this study, new CVIs for uncertain data, based on kernel probabilistic distance measures to calculate the distance between two distributions in feature space, are proposed for uncertain clusters with arbitrary shapes, sub-clusters, and noise in objects. By transforming original uncertain data into kernel spaces, the proposed CVI accurately measures the compactness and separability of a cluster for arbitrary cluster shapes and is robust to noise and outliers in a cluster. The proposed CVI was evaluated for diverse types of simulated and real-life uncertain objects, confirming that the proposed validity indexes in feature space outperform the pre-existing ones in the original space. MDPI 2023-04-03 /pmc/articles/PMC10099331/ /pubmed/37050769 http://dx.doi.org/10.3390/s23073708 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Ko, Changwan Baek, Jaeseung Tavakkol, Behnam Jeong, Young-Seon Cluster Validity Index for Uncertain Data Based on a Probabilistic Distance Measure in Feature Space |
title | Cluster Validity Index for Uncertain Data Based on a Probabilistic Distance Measure in Feature Space |
title_full | Cluster Validity Index for Uncertain Data Based on a Probabilistic Distance Measure in Feature Space |
title_fullStr | Cluster Validity Index for Uncertain Data Based on a Probabilistic Distance Measure in Feature Space |
title_full_unstemmed | Cluster Validity Index for Uncertain Data Based on a Probabilistic Distance Measure in Feature Space |
title_short | Cluster Validity Index for Uncertain Data Based on a Probabilistic Distance Measure in Feature Space |
title_sort | cluster validity index for uncertain data based on a probabilistic distance measure in feature space |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10099331/ https://www.ncbi.nlm.nih.gov/pubmed/37050769 http://dx.doi.org/10.3390/s23073708 |
work_keys_str_mv | AT kochangwan clustervalidityindexforuncertaindatabasedonaprobabilisticdistancemeasureinfeaturespace AT baekjaeseung clustervalidityindexforuncertaindatabasedonaprobabilisticdistancemeasureinfeaturespace AT tavakkolbehnam clustervalidityindexforuncertaindatabasedonaprobabilisticdistancemeasureinfeaturespace AT jeongyoungseon clustervalidityindexforuncertaindatabasedonaprobabilisticdistancemeasureinfeaturespace |