Cargando…

SVM clustering

BACKGROUND: Support Vector Machines (SVMs) provide a powerful method for classification (supervised learning). Use of SVMs for clustering (unsupervised learning) is now being considered in a number of different ways. RESULTS: An SVM-based clustering algorithm is introduced that clusters data with no...

Descripción completa

Detalles Bibliográficos
Autores principales:	Winters-Hilt, Stephen, Merat, Sam
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2007
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2099486/ https://www.ncbi.nlm.nih.gov/pubmed/18047717 http://dx.doi.org/10.1186/1471-2105-8-S7-S18

_version_	1782138316499451904
author	Winters-Hilt, Stephen Merat, Sam
author_facet	Winters-Hilt, Stephen Merat, Sam
author_sort	Winters-Hilt, Stephen
collection	PubMed
description	BACKGROUND: Support Vector Machines (SVMs) provide a powerful method for classification (supervised learning). Use of SVMs for clustering (unsupervised learning) is now being considered in a number of different ways. RESULTS: An SVM-based clustering algorithm is introduced that clusters data with no a priori knowledge of input classes. The algorithm initializes by first running a binary SVM classifier against a data set with each vector in the set randomly labelled, this is repeated until an initial convergence occurs. Once this initialization step is complete, the SVM confidence parameters for classification on each of the training instances can be accessed. The lowest confidence data (e.g., the worst of the mislabelled data) then has its' labels switched to the other class label. The SVM is then re-run on the data set (with partly re-labelled data) and is guaranteed to converge in this situation since it converged previously, and now it has fewer data points to carry with mislabelling penalties. This approach appears to limit exposure to the local minima traps that can occur with other approaches. Thus, the algorithm then improves on its weakly convergent result by SVM re-training after each re-labeling on the worst of the misclassified vectors – i.e., those feature vectors with confidence factor values beyond some threshold. The repetition of the above process improves the accuracy, here a measure of separability, until there are no misclassifications. Variations on this type of clustering approach are shown. CONCLUSION: Non-parametric SVM-based clustering methods may allow for much improved performance over parametric approaches, particularly if they can be designed to inherit the strengths of their supervised SVM counterparts.
format	Text
id	pubmed-2099486
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-20994862007-12-03 SVM clustering Winters-Hilt, Stephen Merat, Sam BMC Bioinformatics Proceedings BACKGROUND: Support Vector Machines (SVMs) provide a powerful method for classification (supervised learning). Use of SVMs for clustering (unsupervised learning) is now being considered in a number of different ways. RESULTS: An SVM-based clustering algorithm is introduced that clusters data with no a priori knowledge of input classes. The algorithm initializes by first running a binary SVM classifier against a data set with each vector in the set randomly labelled, this is repeated until an initial convergence occurs. Once this initialization step is complete, the SVM confidence parameters for classification on each of the training instances can be accessed. The lowest confidence data (e.g., the worst of the mislabelled data) then has its' labels switched to the other class label. The SVM is then re-run on the data set (with partly re-labelled data) and is guaranteed to converge in this situation since it converged previously, and now it has fewer data points to carry with mislabelling penalties. This approach appears to limit exposure to the local minima traps that can occur with other approaches. Thus, the algorithm then improves on its weakly convergent result by SVM re-training after each re-labeling on the worst of the misclassified vectors – i.e., those feature vectors with confidence factor values beyond some threshold. The repetition of the above process improves the accuracy, here a measure of separability, until there are no misclassifications. Variations on this type of clustering approach are shown. CONCLUSION: Non-parametric SVM-based clustering methods may allow for much improved performance over parametric approaches, particularly if they can be designed to inherit the strengths of their supervised SVM counterparts. BioMed Central 2007-11-01 /pmc/articles/PMC2099486/ /pubmed/18047717 http://dx.doi.org/10.1186/1471-2105-8-S7-S18 Text en Copyright © 2007 Winters-Hilt and Merat; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Winters-Hilt, Stephen Merat, Sam SVM clustering
title	SVM clustering
title_full	SVM clustering
title_fullStr	SVM clustering
title_full_unstemmed	SVM clustering
title_short	SVM clustering
title_sort	svm clustering
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2099486/ https://www.ncbi.nlm.nih.gov/pubmed/18047717 http://dx.doi.org/10.1186/1471-2105-8-S7-S18
work_keys_str_mv	AT wintershiltstephen svmclustering AT meratsam svmclustering

SVM clustering

Ejemplares similares