Cargando…

A Fast Clustering Algorithm for Data with a Few Labeled Instances

The diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Jinfeng, Xiao, Yong, Wang, Jiabing, Ma, Qianli, Shen, Yanhua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4377383/
https://www.ncbi.nlm.nih.gov/pubmed/25861252
http://dx.doi.org/10.1155/2015/196098
_version_ 1782363896692080640
author Yang, Jinfeng
Xiao, Yong
Wang, Jiabing
Ma, Qianli
Shen, Yanhua
author_facet Yang, Jinfeng
Xiao, Yong
Wang, Jiabing
Ma, Qianli
Shen, Yanhua
author_sort Yang, Jinfeng
collection PubMed
description The diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects. First, we present a simple and fast clustering algorithm with the following property: if the ratio of the minimum split to the maximum diameter (RSD) of the optimal solution is greater than one, the algorithm returns optimal solutions for three clustering criteria. Second, we study the metric learning problem: learn a distance metric to make the RSD as large as possible. Compared with existing metric learning algorithms, one of our metric learning algorithms is computationally efficient: it is a linear programming model rather than a semidefinite programming model used by most of existing algorithms. We demonstrate empirically that the supervision and the learned metric can improve the clustering quality.
format Online
Article
Text
id pubmed-4377383
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-43773832015-04-08 A Fast Clustering Algorithm for Data with a Few Labeled Instances Yang, Jinfeng Xiao, Yong Wang, Jiabing Ma, Qianli Shen, Yanhua Comput Intell Neurosci Research Article The diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects. First, we present a simple and fast clustering algorithm with the following property: if the ratio of the minimum split to the maximum diameter (RSD) of the optimal solution is greater than one, the algorithm returns optimal solutions for three clustering criteria. Second, we study the metric learning problem: learn a distance metric to make the RSD as large as possible. Compared with existing metric learning algorithms, one of our metric learning algorithms is computationally efficient: it is a linear programming model rather than a semidefinite programming model used by most of existing algorithms. We demonstrate empirically that the supervision and the learned metric can improve the clustering quality. Hindawi Publishing Corporation 2015 2015-03-11 /pmc/articles/PMC4377383/ /pubmed/25861252 http://dx.doi.org/10.1155/2015/196098 Text en Copyright © 2015 Jinfeng Yang et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Yang, Jinfeng
Xiao, Yong
Wang, Jiabing
Ma, Qianli
Shen, Yanhua
A Fast Clustering Algorithm for Data with a Few Labeled Instances
title A Fast Clustering Algorithm for Data with a Few Labeled Instances
title_full A Fast Clustering Algorithm for Data with a Few Labeled Instances
title_fullStr A Fast Clustering Algorithm for Data with a Few Labeled Instances
title_full_unstemmed A Fast Clustering Algorithm for Data with a Few Labeled Instances
title_short A Fast Clustering Algorithm for Data with a Few Labeled Instances
title_sort fast clustering algorithm for data with a few labeled instances
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4377383/
https://www.ncbi.nlm.nih.gov/pubmed/25861252
http://dx.doi.org/10.1155/2015/196098
work_keys_str_mv AT yangjinfeng afastclusteringalgorithmfordatawithafewlabeledinstances
AT xiaoyong afastclusteringalgorithmfordatawithafewlabeledinstances
AT wangjiabing afastclusteringalgorithmfordatawithafewlabeledinstances
AT maqianli afastclusteringalgorithmfordatawithafewlabeledinstances
AT shenyanhua afastclusteringalgorithmfordatawithafewlabeledinstances
AT yangjinfeng fastclusteringalgorithmfordatawithafewlabeledinstances
AT xiaoyong fastclusteringalgorithmfordatawithafewlabeledinstances
AT wangjiabing fastclusteringalgorithmfordatawithafewlabeledinstances
AT maqianli fastclusteringalgorithmfordatawithafewlabeledinstances
AT shenyanhua fastclusteringalgorithmfordatawithafewlabeledinstances