Cargando…

Silhouette Analysis for Performance Evaluation in Machine Learning with Applications to Clustering

Grouping the objects based on their similarities is an important common task in machine learning applications. Many clustering methods have been developed, among them k-means based clustering methods have been broadly used and several extensions have been developed to improve the original k-means cl...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shutaywi, Meshal, Kachouie, Nezamoddin N.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8234541/ https://www.ncbi.nlm.nih.gov/pubmed/34208552 http://dx.doi.org/10.3390/e23060759

_version_	1783714107740913664
author	Shutaywi, Meshal Kachouie, Nezamoddin N.
author_facet	Shutaywi, Meshal Kachouie, Nezamoddin N.
author_sort	Shutaywi, Meshal
collection	PubMed
description	Grouping the objects based on their similarities is an important common task in machine learning applications. Many clustering methods have been developed, among them k-means based clustering methods have been broadly used and several extensions have been developed to improve the original k-means clustering method such as k-means ++ and kernel k-means. K-means is a linear clustering method; that is, it divides the objects into linearly separable groups, while kernel k-means is a non-linear technique. Kernel k-means projects the elements to a higher dimensional feature space using a kernel function, and then groups them. Different kernel functions may not perform similarly in clustering of a data set and, in turn, choosing the right kernel for an application could be challenging. In our previous work, we introduced a weighted majority voting method for clustering based on normalized mutual information (NMI). NMI is a supervised method where the true labels for a training set are required to calculate NMI. In this study, we extend our previous work of aggregating the clustering results to develop an unsupervised weighting function where a training set is not available. The proposed weighting function here is based on Silhouette index, as an unsupervised criterion. As a result, a training set is not required to calculate Silhouette index. This makes our new method more sensible in terms of clustering concept.
format	Online Article Text
id	pubmed-8234541
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-82345412021-06-27 Silhouette Analysis for Performance Evaluation in Machine Learning with Applications to Clustering Shutaywi, Meshal Kachouie, Nezamoddin N. Entropy (Basel) Article Grouping the objects based on their similarities is an important common task in machine learning applications. Many clustering methods have been developed, among them k-means based clustering methods have been broadly used and several extensions have been developed to improve the original k-means clustering method such as k-means ++ and kernel k-means. K-means is a linear clustering method; that is, it divides the objects into linearly separable groups, while kernel k-means is a non-linear technique. Kernel k-means projects the elements to a higher dimensional feature space using a kernel function, and then groups them. Different kernel functions may not perform similarly in clustering of a data set and, in turn, choosing the right kernel for an application could be challenging. In our previous work, we introduced a weighted majority voting method for clustering based on normalized mutual information (NMI). NMI is a supervised method where the true labels for a training set are required to calculate NMI. In this study, we extend our previous work of aggregating the clustering results to develop an unsupervised weighting function where a training set is not available. The proposed weighting function here is based on Silhouette index, as an unsupervised criterion. As a result, a training set is not required to calculate Silhouette index. This makes our new method more sensible in terms of clustering concept. MDPI 2021-06-16 /pmc/articles/PMC8234541/ /pubmed/34208552 http://dx.doi.org/10.3390/e23060759 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Shutaywi, Meshal Kachouie, Nezamoddin N. Silhouette Analysis for Performance Evaluation in Machine Learning with Applications to Clustering
title	Silhouette Analysis for Performance Evaluation in Machine Learning with Applications to Clustering
title_full	Silhouette Analysis for Performance Evaluation in Machine Learning with Applications to Clustering
title_fullStr	Silhouette Analysis for Performance Evaluation in Machine Learning with Applications to Clustering
title_full_unstemmed	Silhouette Analysis for Performance Evaluation in Machine Learning with Applications to Clustering
title_short	Silhouette Analysis for Performance Evaluation in Machine Learning with Applications to Clustering
title_sort	silhouette analysis for performance evaluation in machine learning with applications to clustering
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8234541/ https://www.ncbi.nlm.nih.gov/pubmed/34208552 http://dx.doi.org/10.3390/e23060759
work_keys_str_mv	AT shutaywimeshal silhouetteanalysisforperformanceevaluationinmachinelearningwithapplicationstoclustering AT kachouienezamoddinn silhouetteanalysisforperformanceevaluationinmachinelearningwithapplicationstoclustering

Silhouette Analysis for Performance Evaluation in Machine Learning with Applications to Clustering

Ejemplares similares