Cargando…
An Empirical Analysis of Rough Set Categorical Clustering Techniques
Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Ma...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5222507/ https://www.ncbi.nlm.nih.gov/pubmed/28068344 http://dx.doi.org/10.1371/journal.pone.0164803 |
_version_ | 1782493029915951104 |
---|---|
author | Uddin, Jamal Ghazali, Rozaida Deris, Mustafa Mat |
author_facet | Uddin, Jamal Ghazali, Rozaida Deris, Mustafa Mat |
author_sort | Uddin, Jamal |
collection | PubMed |
description | Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy. |
format | Online Article Text |
id | pubmed-5222507 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-52225072017-01-19 An Empirical Analysis of Rough Set Categorical Clustering Techniques Uddin, Jamal Ghazali, Rozaida Deris, Mustafa Mat PLoS One Research Article Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy. Public Library of Science 2017-01-09 /pmc/articles/PMC5222507/ /pubmed/28068344 http://dx.doi.org/10.1371/journal.pone.0164803 Text en © 2017 Uddin et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Uddin, Jamal Ghazali, Rozaida Deris, Mustafa Mat An Empirical Analysis of Rough Set Categorical Clustering Techniques |
title | An Empirical Analysis of Rough Set Categorical Clustering Techniques |
title_full | An Empirical Analysis of Rough Set Categorical Clustering Techniques |
title_fullStr | An Empirical Analysis of Rough Set Categorical Clustering Techniques |
title_full_unstemmed | An Empirical Analysis of Rough Set Categorical Clustering Techniques |
title_short | An Empirical Analysis of Rough Set Categorical Clustering Techniques |
title_sort | empirical analysis of rough set categorical clustering techniques |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5222507/ https://www.ncbi.nlm.nih.gov/pubmed/28068344 http://dx.doi.org/10.1371/journal.pone.0164803 |
work_keys_str_mv | AT uddinjamal anempiricalanalysisofroughsetcategoricalclusteringtechniques AT ghazalirozaida anempiricalanalysisofroughsetcategoricalclusteringtechniques AT derismustafamat anempiricalanalysisofroughsetcategoricalclusteringtechniques AT uddinjamal empiricalanalysisofroughsetcategoricalclusteringtechniques AT ghazalirozaida empiricalanalysisofroughsetcategoricalclusteringtechniques AT derismustafamat empiricalanalysisofroughsetcategoricalclusteringtechniques |