Cargando…
Feature selection and semi-supervised clustering using multiobjective optimization
In this paper we have coupled feature selection problem with semi-supervised clustering. Semi-supervised clustering utilizes the information of unsupervised and supervised learning in order to overcome the problems related to them. But in general all the features present in the data set may not be i...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4174553/ https://www.ncbi.nlm.nih.gov/pubmed/25279282 http://dx.doi.org/10.1186/2193-1801-3-465 |
_version_ | 1782336361008726016 |
---|---|
author | Saha, Sriparna Ekbal, Asif Alok, Abhay Kumar Spandana, Rachamadugu |
author_facet | Saha, Sriparna Ekbal, Asif Alok, Abhay Kumar Spandana, Rachamadugu |
author_sort | Saha, Sriparna |
collection | PubMed |
description | In this paper we have coupled feature selection problem with semi-supervised clustering. Semi-supervised clustering utilizes the information of unsupervised and supervised learning in order to overcome the problems related to them. But in general all the features present in the data set may not be important for clustering purpose. Thus appropriate selection of features from the set of all features is very much relevant from clustering point of view. In this paper we have solved the problem of automatic feature selection and semi-supervised clustering using multiobjective optimization. A recently created simulated annealing based multiobjective optimization technique titled archived multiobjective simulated annealing (AMOSA) is used as the underlying optimization technique. Here features and cluster centers are encoded in the form of a string. We assume that for each data set for 10% data points class level information are known to us. Two internal cluster validity indices reflecting different data properties, an external cluster validity index measuring the similarity between the obtained partitioning and the true labelling for 10% data points and a measure counting the number of features present in a particular string are optimized using the search capability of AMOSA. AMOSA is utilized to detect the appropriate subset of features, appropriate number of clusters as well as the appropriate partitioning from any given data set. The effectiveness of the proposed semi-supervised feature selection technique as compared to the existing techniques is shown for seven real-life data sets of varying complexities. |
format | Online Article Text |
id | pubmed-4174553 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-41745532014-10-02 Feature selection and semi-supervised clustering using multiobjective optimization Saha, Sriparna Ekbal, Asif Alok, Abhay Kumar Spandana, Rachamadugu Springerplus Research In this paper we have coupled feature selection problem with semi-supervised clustering. Semi-supervised clustering utilizes the information of unsupervised and supervised learning in order to overcome the problems related to them. But in general all the features present in the data set may not be important for clustering purpose. Thus appropriate selection of features from the set of all features is very much relevant from clustering point of view. In this paper we have solved the problem of automatic feature selection and semi-supervised clustering using multiobjective optimization. A recently created simulated annealing based multiobjective optimization technique titled archived multiobjective simulated annealing (AMOSA) is used as the underlying optimization technique. Here features and cluster centers are encoded in the form of a string. We assume that for each data set for 10% data points class level information are known to us. Two internal cluster validity indices reflecting different data properties, an external cluster validity index measuring the similarity between the obtained partitioning and the true labelling for 10% data points and a measure counting the number of features present in a particular string are optimized using the search capability of AMOSA. AMOSA is utilized to detect the appropriate subset of features, appropriate number of clusters as well as the appropriate partitioning from any given data set. The effectiveness of the proposed semi-supervised feature selection technique as compared to the existing techniques is shown for seven real-life data sets of varying complexities. Springer International Publishing 2014-08-26 /pmc/articles/PMC4174553/ /pubmed/25279282 http://dx.doi.org/10.1186/2193-1801-3-465 Text en © Saha et al.; licensee Springer. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. |
spellingShingle | Research Saha, Sriparna Ekbal, Asif Alok, Abhay Kumar Spandana, Rachamadugu Feature selection and semi-supervised clustering using multiobjective optimization |
title | Feature selection and semi-supervised clustering using multiobjective optimization |
title_full | Feature selection and semi-supervised clustering using multiobjective optimization |
title_fullStr | Feature selection and semi-supervised clustering using multiobjective optimization |
title_full_unstemmed | Feature selection and semi-supervised clustering using multiobjective optimization |
title_short | Feature selection and semi-supervised clustering using multiobjective optimization |
title_sort | feature selection and semi-supervised clustering using multiobjective optimization |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4174553/ https://www.ncbi.nlm.nih.gov/pubmed/25279282 http://dx.doi.org/10.1186/2193-1801-3-465 |
work_keys_str_mv | AT sahasriparna featureselectionandsemisupervisedclusteringusingmultiobjectiveoptimization AT ekbalasif featureselectionandsemisupervisedclusteringusingmultiobjectiveoptimization AT alokabhaykumar featureselectionandsemisupervisedclusteringusingmultiobjectiveoptimization AT spandanarachamadugu featureselectionandsemisupervisedclusteringusingmultiobjectiveoptimization |