Cargando…

A unified approach for cluster-wise and general noise rejection approaches for k-means clustering

Hard C-means (HCM; k-means) is one of the most widely used partitive clustering techniques. However, HCM is strongly affected by noise objects and cannot represent cluster overlap. To reduce the influence of noise objects, objects distant from cluster centers are rejected in some noise rejection app...

Descripción completa

Detalles Bibliográficos
Autor principal: Ubukata, Seiki
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924505/
https://www.ncbi.nlm.nih.gov/pubmed/33816891
http://dx.doi.org/10.7717/peerj-cs.238
_version_ 1783659104958414848
author Ubukata, Seiki
author_facet Ubukata, Seiki
author_sort Ubukata, Seiki
collection PubMed
description Hard C-means (HCM; k-means) is one of the most widely used partitive clustering techniques. However, HCM is strongly affected by noise objects and cannot represent cluster overlap. To reduce the influence of noise objects, objects distant from cluster centers are rejected in some noise rejection approaches including general noise rejection (GNR) and cluster-wise noise rejection (CNR). Generalized rough C-means (GRCM) can deal with positive, negative, and boundary belonging of object to clusters by reference to rough set theory. GRCM realizes cluster overlap by the linear function threshold-based object-cluster assignment. In this study, as a unified approach for GNR and CNR in HCM, we propose linear function threshold-based C-means (LiFTCM) by relaxing GRCM. We show that the linear function threshold-based assignment in LiFTCM includes GNR, CNR, and their combinations as well as rough assignment of GRCM. The classification boundary is visualized so that the characteristics of LiFTCM in various parameter settings are clarified. Numerical experiments demonstrate that the combinations of rough clustering or the combinations of GNR and CNR realized by LiFTCM yield satisfactory results.
format Online
Article
Text
id pubmed-7924505
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-79245052021-04-02 A unified approach for cluster-wise and general noise rejection approaches for k-means clustering Ubukata, Seiki PeerJ Comput Sci Data Mining and Machine Learning Hard C-means (HCM; k-means) is one of the most widely used partitive clustering techniques. However, HCM is strongly affected by noise objects and cannot represent cluster overlap. To reduce the influence of noise objects, objects distant from cluster centers are rejected in some noise rejection approaches including general noise rejection (GNR) and cluster-wise noise rejection (CNR). Generalized rough C-means (GRCM) can deal with positive, negative, and boundary belonging of object to clusters by reference to rough set theory. GRCM realizes cluster overlap by the linear function threshold-based object-cluster assignment. In this study, as a unified approach for GNR and CNR in HCM, we propose linear function threshold-based C-means (LiFTCM) by relaxing GRCM. We show that the linear function threshold-based assignment in LiFTCM includes GNR, CNR, and their combinations as well as rough assignment of GRCM. The classification boundary is visualized so that the characteristics of LiFTCM in various parameter settings are clarified. Numerical experiments demonstrate that the combinations of rough clustering or the combinations of GNR and CNR realized by LiFTCM yield satisfactory results. PeerJ Inc. 2019-11-18 /pmc/articles/PMC7924505/ /pubmed/33816891 http://dx.doi.org/10.7717/peerj-cs.238 Text en ©2019 Ubukata https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Data Mining and Machine Learning
Ubukata, Seiki
A unified approach for cluster-wise and general noise rejection approaches for k-means clustering
title A unified approach for cluster-wise and general noise rejection approaches for k-means clustering
title_full A unified approach for cluster-wise and general noise rejection approaches for k-means clustering
title_fullStr A unified approach for cluster-wise and general noise rejection approaches for k-means clustering
title_full_unstemmed A unified approach for cluster-wise and general noise rejection approaches for k-means clustering
title_short A unified approach for cluster-wise and general noise rejection approaches for k-means clustering
title_sort unified approach for cluster-wise and general noise rejection approaches for k-means clustering
topic Data Mining and Machine Learning
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924505/
https://www.ncbi.nlm.nih.gov/pubmed/33816891
http://dx.doi.org/10.7717/peerj-cs.238
work_keys_str_mv AT ubukataseiki aunifiedapproachforclusterwiseandgeneralnoiserejectionapproachesforkmeansclustering
AT ubukataseiki unifiedapproachforclusterwiseandgeneralnoiserejectionapproachesforkmeansclustering