Cargando…

An Empirical Analysis of Rough Set Categorical Clustering Techniques

Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Ma...

Descripción completa

Detalles Bibliográficos
Autores principales:	Uddin, Jamal, Ghazali, Rozaida, Deris, Mustafa Mat
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5222507/ https://www.ncbi.nlm.nih.gov/pubmed/28068344 http://dx.doi.org/10.1371/journal.pone.0164803

_version_	1782493029915951104
author	Uddin, Jamal Ghazali, Rozaida Deris, Mustafa Mat
author_facet	Uddin, Jamal Ghazali, Rozaida Deris, Mustafa Mat
author_sort	Uddin, Jamal
collection	PubMed
description	Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy.
format	Online Article Text
id	pubmed-5222507
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-52225072017-01-19 An Empirical Analysis of Rough Set Categorical Clustering Techniques Uddin, Jamal Ghazali, Rozaida Deris, Mustafa Mat PLoS One Research Article Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy. Public Library of Science 2017-01-09 /pmc/articles/PMC5222507/ /pubmed/28068344 http://dx.doi.org/10.1371/journal.pone.0164803 Text en © 2017 Uddin et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Uddin, Jamal Ghazali, Rozaida Deris, Mustafa Mat An Empirical Analysis of Rough Set Categorical Clustering Techniques
title	An Empirical Analysis of Rough Set Categorical Clustering Techniques
title_full	An Empirical Analysis of Rough Set Categorical Clustering Techniques
title_fullStr	An Empirical Analysis of Rough Set Categorical Clustering Techniques
title_full_unstemmed	An Empirical Analysis of Rough Set Categorical Clustering Techniques
title_short	An Empirical Analysis of Rough Set Categorical Clustering Techniques
title_sort	empirical analysis of rough set categorical clustering techniques
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5222507/ https://www.ncbi.nlm.nih.gov/pubmed/28068344 http://dx.doi.org/10.1371/journal.pone.0164803
work_keys_str_mv	AT uddinjamal anempiricalanalysisofroughsetcategoricalclusteringtechniques AT ghazalirozaida anempiricalanalysisofroughsetcategoricalclusteringtechniques AT derismustafamat anempiricalanalysisofroughsetcategoricalclusteringtechniques AT uddinjamal empiricalanalysisofroughsetcategoricalclusteringtechniques AT ghazalirozaida empiricalanalysisofroughsetcategoricalclusteringtechniques AT derismustafamat empiricalanalysisofroughsetcategoricalclusteringtechniques

An Empirical Analysis of Rough Set Categorical Clustering Techniques

Ejemplares similares