Cargando…

A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data

Data classification is an important research topic in the field of data mining. With the rapid development in social media sites and IoT devices, data have grown tremendously in volume and complexity, which has resulted in a lot of large and complex high-dimensional data. Classifying such high-dimen...

Descripción completa

Detalles Bibliográficos
Autores principales: Azhar, Muhammad, Li, Mark Junjie, Zhexue Huang, Joshua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7515435/
http://dx.doi.org/10.3390/e21090906
_version_ 1783586817298137088
author Azhar, Muhammad
Li, Mark Junjie
Zhexue Huang, Joshua
author_facet Azhar, Muhammad
Li, Mark Junjie
Zhexue Huang, Joshua
author_sort Azhar, Muhammad
collection PubMed
description Data classification is an important research topic in the field of data mining. With the rapid development in social media sites and IoT devices, data have grown tremendously in volume and complexity, which has resulted in a lot of large and complex high-dimensional data. Classifying such high-dimensional complex data with a large number of classes has been a great challenge for current state-of-the-art methods. This paper presents a novel, hierarchical, gamma mixture model-based unsupervised method for classifying high-dimensional data with a large number of classes. In this method, we first partition the features of the dataset into feature strata by using k-means. Then, a set of subspace data sets is generated from the feature strata by using the stratified subspace sampling method. After that, the GMM Tree algorithm is used to identify the number of clusters and initial clusters in each subspace dataset and passing these initial cluster centers to k-means to generate base subspace clustering results. Then, the subspace clustering result is integrated into an object cluster association (OCA) matrix by using the link-based method. The ensemble clustering result is generated from the OCA matrix by the k-means algorithm with the number of clusters identified by the GMM Tree algorithm. After producing the ensemble clustering result, the dominant class label is assigned to each cluster after computing the purity. A classification is made on the object by computing the distance between the new object and the center of each cluster in the classifier, and the class label of the cluster is assigned to the new object which has the shortest distance. A series of experiments were conducted on twelve synthetic and eight real-world data sets, with different numbers of classes, features, and objects. The experimental results have shown that the new method outperforms other state-of-the-art techniques to classify data in most of the data sets.
format Online
Article
Text
id pubmed-7515435
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75154352020-11-09 A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data Azhar, Muhammad Li, Mark Junjie Zhexue Huang, Joshua Entropy (Basel) Article Data classification is an important research topic in the field of data mining. With the rapid development in social media sites and IoT devices, data have grown tremendously in volume and complexity, which has resulted in a lot of large and complex high-dimensional data. Classifying such high-dimensional complex data with a large number of classes has been a great challenge for current state-of-the-art methods. This paper presents a novel, hierarchical, gamma mixture model-based unsupervised method for classifying high-dimensional data with a large number of classes. In this method, we first partition the features of the dataset into feature strata by using k-means. Then, a set of subspace data sets is generated from the feature strata by using the stratified subspace sampling method. After that, the GMM Tree algorithm is used to identify the number of clusters and initial clusters in each subspace dataset and passing these initial cluster centers to k-means to generate base subspace clustering results. Then, the subspace clustering result is integrated into an object cluster association (OCA) matrix by using the link-based method. The ensemble clustering result is generated from the OCA matrix by the k-means algorithm with the number of clusters identified by the GMM Tree algorithm. After producing the ensemble clustering result, the dominant class label is assigned to each cluster after computing the purity. A classification is made on the object by computing the distance between the new object and the center of each cluster in the classifier, and the class label of the cluster is assigned to the new object which has the shortest distance. A series of experiments were conducted on twelve synthetic and eight real-world data sets, with different numbers of classes, features, and objects. The experimental results have shown that the new method outperforms other state-of-the-art techniques to classify data in most of the data sets. MDPI 2019-09-18 /pmc/articles/PMC7515435/ http://dx.doi.org/10.3390/e21090906 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Azhar, Muhammad
Li, Mark Junjie
Zhexue Huang, Joshua
A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data
title A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data
title_full A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data
title_fullStr A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data
title_full_unstemmed A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data
title_short A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data
title_sort hierarchical gamma mixture model-based method for classification of high-dimensional data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7515435/
http://dx.doi.org/10.3390/e21090906
work_keys_str_mv AT azharmuhammad ahierarchicalgammamixturemodelbasedmethodforclassificationofhighdimensionaldata
AT limarkjunjie ahierarchicalgammamixturemodelbasedmethodforclassificationofhighdimensionaldata
AT zhexuehuangjoshua ahierarchicalgammamixturemodelbasedmethodforclassificationofhighdimensionaldata
AT azharmuhammad hierarchicalgammamixturemodelbasedmethodforclassificationofhighdimensionaldata
AT limarkjunjie hierarchicalgammamixturemodelbasedmethodforclassificationofhighdimensionaldata
AT zhexuehuangjoshua hierarchicalgammamixturemodelbasedmethodforclassificationofhighdimensionaldata