Cargando…

A Novel Model on Reinforce K-Means Using Location Division Model and Outlier of Initial Value for Lowering Data Cost

Today, semi-structured and unstructured data are mainly collected and analyzed for data analysis applicable to various systems. Such data have a dense distribution of space and usually contain outliers and noise data. There have been ongoing research studies on clustering algorithms to classify such...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jung, Se-Hoon, Lee, Hansung, Huh, Jun-Ho
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7517527/ https://www.ncbi.nlm.nih.gov/pubmed/33286671 http://dx.doi.org/10.3390/e22080902

_version_	1783587246619754496
author	Jung, Se-Hoon Lee, Hansung Huh, Jun-Ho
author_facet	Jung, Se-Hoon Lee, Hansung Huh, Jun-Ho
author_sort	Jung, Se-Hoon
collection	PubMed
description	Today, semi-structured and unstructured data are mainly collected and analyzed for data analysis applicable to various systems. Such data have a dense distribution of space and usually contain outliers and noise data. There have been ongoing research studies on clustering algorithms to classify such data (outliers and noise data). The K-means algorithm is one of the most investigated clustering algorithms. Researchers have pointed out a couple of problems such as processing clustering for the number of clusters, K, by an analyst through his or her random choices, producing biased results in data classification through the connection of nodes in dense data, and higher implementation costs and lower accuracy according to the selection models of the initial centroids. Most K-means researchers have pointed out the disadvantage of outliers belonging to external or other clusters instead of the concerned ones when K is big or small. Thus, the present study analyzed problems with the selection of initial centroids in the existing K-means algorithm and investigated a new K-means algorithm of selecting initial centroids. The present study proposed a method of cutting down clustering calculation costs by applying an initial center point approach based on space division and outliers so that no objects would be subordinate to the initial cluster center for dependence lower from the initial cluster center. Since data containing outliers could lead to inappropriate results when they are reflected in the choice of a center point of a cluster, the study proposed an algorithm to minimize the error rates of outliers based on an improved algorithm for space division and distance measurement. The performance experiment results of the proposed algorithm show that it lowered the execution costs by about 13–14% compared with those of previous studies when there was an increase in the volume of clustering data or the number of clusters. It also recorded a lower frequency of outliers, a lower effectiveness index, which assesses performance deterioration with outliers, and a reduction of outliers by about 60%.
format	Online Article Text
id	pubmed-7517527
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-75175272020-11-09 A Novel Model on Reinforce K-Means Using Location Division Model and Outlier of Initial Value for Lowering Data Cost Jung, Se-Hoon Lee, Hansung Huh, Jun-Ho Entropy (Basel) Article Today, semi-structured and unstructured data are mainly collected and analyzed for data analysis applicable to various systems. Such data have a dense distribution of space and usually contain outliers and noise data. There have been ongoing research studies on clustering algorithms to classify such data (outliers and noise data). The K-means algorithm is one of the most investigated clustering algorithms. Researchers have pointed out a couple of problems such as processing clustering for the number of clusters, K, by an analyst through his or her random choices, producing biased results in data classification through the connection of nodes in dense data, and higher implementation costs and lower accuracy according to the selection models of the initial centroids. Most K-means researchers have pointed out the disadvantage of outliers belonging to external or other clusters instead of the concerned ones when K is big or small. Thus, the present study analyzed problems with the selection of initial centroids in the existing K-means algorithm and investigated a new K-means algorithm of selecting initial centroids. The present study proposed a method of cutting down clustering calculation costs by applying an initial center point approach based on space division and outliers so that no objects would be subordinate to the initial cluster center for dependence lower from the initial cluster center. Since data containing outliers could lead to inappropriate results when they are reflected in the choice of a center point of a cluster, the study proposed an algorithm to minimize the error rates of outliers based on an improved algorithm for space division and distance measurement. The performance experiment results of the proposed algorithm show that it lowered the execution costs by about 13–14% compared with those of previous studies when there was an increase in the volume of clustering data or the number of clusters. It also recorded a lower frequency of outliers, a lower effectiveness index, which assesses performance deterioration with outliers, and a reduction of outliers by about 60%. MDPI 2020-08-17 /pmc/articles/PMC7517527/ /pubmed/33286671 http://dx.doi.org/10.3390/e22080902 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Jung, Se-Hoon Lee, Hansung Huh, Jun-Ho A Novel Model on Reinforce K-Means Using Location Division Model and Outlier of Initial Value for Lowering Data Cost
title	A Novel Model on Reinforce K-Means Using Location Division Model and Outlier of Initial Value for Lowering Data Cost
title_full	A Novel Model on Reinforce K-Means Using Location Division Model and Outlier of Initial Value for Lowering Data Cost
title_fullStr	A Novel Model on Reinforce K-Means Using Location Division Model and Outlier of Initial Value for Lowering Data Cost
title_full_unstemmed	A Novel Model on Reinforce K-Means Using Location Division Model and Outlier of Initial Value for Lowering Data Cost
title_short	A Novel Model on Reinforce K-Means Using Location Division Model and Outlier of Initial Value for Lowering Data Cost
title_sort	novel model on reinforce k-means using location division model and outlier of initial value for lowering data cost
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7517527/ https://www.ncbi.nlm.nih.gov/pubmed/33286671 http://dx.doi.org/10.3390/e22080902
work_keys_str_mv	AT jungsehoon anovelmodelonreinforcekmeansusinglocationdivisionmodelandoutlierofinitialvalueforloweringdatacost AT leehansung anovelmodelonreinforcekmeansusinglocationdivisionmodelandoutlierofinitialvalueforloweringdatacost AT huhjunho anovelmodelonreinforcekmeansusinglocationdivisionmodelandoutlierofinitialvalueforloweringdatacost AT jungsehoon novelmodelonreinforcekmeansusinglocationdivisionmodelandoutlierofinitialvalueforloweringdatacost AT leehansung novelmodelonreinforcekmeansusinglocationdivisionmodelandoutlierofinitialvalueforloweringdatacost AT huhjunho novelmodelonreinforcekmeansusinglocationdivisionmodelandoutlierofinitialvalueforloweringdatacost

A Novel Model on Reinforce K-Means Using Location Division Model and Outlier of Initial Value for Lowering Data Cost

Ejemplares similares