Cargando…
A Fast Clustering Algorithm for Data with a Few Labeled Instances
The diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi Publishing Corporation
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4377383/ https://www.ncbi.nlm.nih.gov/pubmed/25861252 http://dx.doi.org/10.1155/2015/196098 |
_version_ | 1782363896692080640 |
---|---|
author | Yang, Jinfeng Xiao, Yong Wang, Jiabing Ma, Qianli Shen, Yanhua |
author_facet | Yang, Jinfeng Xiao, Yong Wang, Jiabing Ma, Qianli Shen, Yanhua |
author_sort | Yang, Jinfeng |
collection | PubMed |
description | The diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects. First, we present a simple and fast clustering algorithm with the following property: if the ratio of the minimum split to the maximum diameter (RSD) of the optimal solution is greater than one, the algorithm returns optimal solutions for three clustering criteria. Second, we study the metric learning problem: learn a distance metric to make the RSD as large as possible. Compared with existing metric learning algorithms, one of our metric learning algorithms is computationally efficient: it is a linear programming model rather than a semidefinite programming model used by most of existing algorithms. We demonstrate empirically that the supervision and the learned metric can improve the clustering quality. |
format | Online Article Text |
id | pubmed-4377383 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Hindawi Publishing Corporation |
record_format | MEDLINE/PubMed |
spelling | pubmed-43773832015-04-08 A Fast Clustering Algorithm for Data with a Few Labeled Instances Yang, Jinfeng Xiao, Yong Wang, Jiabing Ma, Qianli Shen, Yanhua Comput Intell Neurosci Research Article The diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects. First, we present a simple and fast clustering algorithm with the following property: if the ratio of the minimum split to the maximum diameter (RSD) of the optimal solution is greater than one, the algorithm returns optimal solutions for three clustering criteria. Second, we study the metric learning problem: learn a distance metric to make the RSD as large as possible. Compared with existing metric learning algorithms, one of our metric learning algorithms is computationally efficient: it is a linear programming model rather than a semidefinite programming model used by most of existing algorithms. We demonstrate empirically that the supervision and the learned metric can improve the clustering quality. Hindawi Publishing Corporation 2015 2015-03-11 /pmc/articles/PMC4377383/ /pubmed/25861252 http://dx.doi.org/10.1155/2015/196098 Text en Copyright © 2015 Jinfeng Yang et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Yang, Jinfeng Xiao, Yong Wang, Jiabing Ma, Qianli Shen, Yanhua A Fast Clustering Algorithm for Data with a Few Labeled Instances |
title | A Fast Clustering Algorithm for Data with a Few Labeled Instances |
title_full | A Fast Clustering Algorithm for Data with a Few Labeled Instances |
title_fullStr | A Fast Clustering Algorithm for Data with a Few Labeled Instances |
title_full_unstemmed | A Fast Clustering Algorithm for Data with a Few Labeled Instances |
title_short | A Fast Clustering Algorithm for Data with a Few Labeled Instances |
title_sort | fast clustering algorithm for data with a few labeled instances |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4377383/ https://www.ncbi.nlm.nih.gov/pubmed/25861252 http://dx.doi.org/10.1155/2015/196098 |
work_keys_str_mv | AT yangjinfeng afastclusteringalgorithmfordatawithafewlabeledinstances AT xiaoyong afastclusteringalgorithmfordatawithafewlabeledinstances AT wangjiabing afastclusteringalgorithmfordatawithafewlabeledinstances AT maqianli afastclusteringalgorithmfordatawithafewlabeledinstances AT shenyanhua afastclusteringalgorithmfordatawithafewlabeledinstances AT yangjinfeng fastclusteringalgorithmfordatawithafewlabeledinstances AT xiaoyong fastclusteringalgorithmfordatawithafewlabeledinstances AT wangjiabing fastclusteringalgorithmfordatawithafewlabeledinstances AT maqianli fastclusteringalgorithmfordatawithafewlabeledinstances AT shenyanhua fastclusteringalgorithmfordatawithafewlabeledinstances |