Cargando…

An Empirical Analysis of Rough Set Categorical Clustering Techniques

Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Ma...

Descripción completa

Detalles Bibliográficos
Autores principales: Uddin, Jamal, Ghazali, Rozaida, Deris, Mustafa Mat
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5222507/
https://www.ncbi.nlm.nih.gov/pubmed/28068344
http://dx.doi.org/10.1371/journal.pone.0164803
_version_ 1782493029915951104
author Uddin, Jamal
Ghazali, Rozaida
Deris, Mustafa Mat
author_facet Uddin, Jamal
Ghazali, Rozaida
Deris, Mustafa Mat
author_sort Uddin, Jamal
collection PubMed
description Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy.
format Online
Article
Text
id pubmed-5222507
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-52225072017-01-19 An Empirical Analysis of Rough Set Categorical Clustering Techniques Uddin, Jamal Ghazali, Rozaida Deris, Mustafa Mat PLoS One Research Article Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy. Public Library of Science 2017-01-09 /pmc/articles/PMC5222507/ /pubmed/28068344 http://dx.doi.org/10.1371/journal.pone.0164803 Text en © 2017 Uddin et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Uddin, Jamal
Ghazali, Rozaida
Deris, Mustafa Mat
An Empirical Analysis of Rough Set Categorical Clustering Techniques
title An Empirical Analysis of Rough Set Categorical Clustering Techniques
title_full An Empirical Analysis of Rough Set Categorical Clustering Techniques
title_fullStr An Empirical Analysis of Rough Set Categorical Clustering Techniques
title_full_unstemmed An Empirical Analysis of Rough Set Categorical Clustering Techniques
title_short An Empirical Analysis of Rough Set Categorical Clustering Techniques
title_sort empirical analysis of rough set categorical clustering techniques
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5222507/
https://www.ncbi.nlm.nih.gov/pubmed/28068344
http://dx.doi.org/10.1371/journal.pone.0164803
work_keys_str_mv AT uddinjamal anempiricalanalysisofroughsetcategoricalclusteringtechniques
AT ghazalirozaida anempiricalanalysisofroughsetcategoricalclusteringtechniques
AT derismustafamat anempiricalanalysisofroughsetcategoricalclusteringtechniques
AT uddinjamal empiricalanalysisofroughsetcategoricalclusteringtechniques
AT ghazalirozaida empiricalanalysisofroughsetcategoricalclusteringtechniques
AT derismustafamat empiricalanalysisofroughsetcategoricalclusteringtechniques