Cargando…

Feature selection and semi-supervised clustering using multiobjective optimization

In this paper we have coupled feature selection problem with semi-supervised clustering. Semi-supervised clustering utilizes the information of unsupervised and supervised learning in order to overcome the problems related to them. But in general all the features present in the data set may not be i...

Descripción completa

Detalles Bibliográficos
Autores principales: Saha, Sriparna, Ekbal, Asif, Alok, Abhay Kumar, Spandana, Rachamadugu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4174553/
https://www.ncbi.nlm.nih.gov/pubmed/25279282
http://dx.doi.org/10.1186/2193-1801-3-465
_version_ 1782336361008726016
author Saha, Sriparna
Ekbal, Asif
Alok, Abhay Kumar
Spandana, Rachamadugu
author_facet Saha, Sriparna
Ekbal, Asif
Alok, Abhay Kumar
Spandana, Rachamadugu
author_sort Saha, Sriparna
collection PubMed
description In this paper we have coupled feature selection problem with semi-supervised clustering. Semi-supervised clustering utilizes the information of unsupervised and supervised learning in order to overcome the problems related to them. But in general all the features present in the data set may not be important for clustering purpose. Thus appropriate selection of features from the set of all features is very much relevant from clustering point of view. In this paper we have solved the problem of automatic feature selection and semi-supervised clustering using multiobjective optimization. A recently created simulated annealing based multiobjective optimization technique titled archived multiobjective simulated annealing (AMOSA) is used as the underlying optimization technique. Here features and cluster centers are encoded in the form of a string. We assume that for each data set for 10% data points class level information are known to us. Two internal cluster validity indices reflecting different data properties, an external cluster validity index measuring the similarity between the obtained partitioning and the true labelling for 10% data points and a measure counting the number of features present in a particular string are optimized using the search capability of AMOSA. AMOSA is utilized to detect the appropriate subset of features, appropriate number of clusters as well as the appropriate partitioning from any given data set. The effectiveness of the proposed semi-supervised feature selection technique as compared to the existing techniques is shown for seven real-life data sets of varying complexities.
format Online
Article
Text
id pubmed-4174553
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-41745532014-10-02 Feature selection and semi-supervised clustering using multiobjective optimization Saha, Sriparna Ekbal, Asif Alok, Abhay Kumar Spandana, Rachamadugu Springerplus Research In this paper we have coupled feature selection problem with semi-supervised clustering. Semi-supervised clustering utilizes the information of unsupervised and supervised learning in order to overcome the problems related to them. But in general all the features present in the data set may not be important for clustering purpose. Thus appropriate selection of features from the set of all features is very much relevant from clustering point of view. In this paper we have solved the problem of automatic feature selection and semi-supervised clustering using multiobjective optimization. A recently created simulated annealing based multiobjective optimization technique titled archived multiobjective simulated annealing (AMOSA) is used as the underlying optimization technique. Here features and cluster centers are encoded in the form of a string. We assume that for each data set for 10% data points class level information are known to us. Two internal cluster validity indices reflecting different data properties, an external cluster validity index measuring the similarity between the obtained partitioning and the true labelling for 10% data points and a measure counting the number of features present in a particular string are optimized using the search capability of AMOSA. AMOSA is utilized to detect the appropriate subset of features, appropriate number of clusters as well as the appropriate partitioning from any given data set. The effectiveness of the proposed semi-supervised feature selection technique as compared to the existing techniques is shown for seven real-life data sets of varying complexities. Springer International Publishing 2014-08-26 /pmc/articles/PMC4174553/ /pubmed/25279282 http://dx.doi.org/10.1186/2193-1801-3-465 Text en © Saha et al.; licensee Springer. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Research
Saha, Sriparna
Ekbal, Asif
Alok, Abhay Kumar
Spandana, Rachamadugu
Feature selection and semi-supervised clustering using multiobjective optimization
title Feature selection and semi-supervised clustering using multiobjective optimization
title_full Feature selection and semi-supervised clustering using multiobjective optimization
title_fullStr Feature selection and semi-supervised clustering using multiobjective optimization
title_full_unstemmed Feature selection and semi-supervised clustering using multiobjective optimization
title_short Feature selection and semi-supervised clustering using multiobjective optimization
title_sort feature selection and semi-supervised clustering using multiobjective optimization
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4174553/
https://www.ncbi.nlm.nih.gov/pubmed/25279282
http://dx.doi.org/10.1186/2193-1801-3-465
work_keys_str_mv AT sahasriparna featureselectionandsemisupervisedclusteringusingmultiobjectiveoptimization
AT ekbalasif featureselectionandsemisupervisedclusteringusingmultiobjectiveoptimization
AT alokabhaykumar featureselectionandsemisupervisedclusteringusingmultiobjectiveoptimization
AT spandanarachamadugu featureselectionandsemisupervisedclusteringusingmultiobjectiveoptimization