Cargando…
An analysis framework for clustering algorithm selection with applications to spectroscopy
Cluster analysis is a valuable unsupervised machine learning technique that is applied in a multitude of domains to identify similarities or clusters in unlabelled data. However, its performance is dependent of the characteristics of the data it is being applied to. There is no universally best clus...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8970496/ https://www.ncbi.nlm.nih.gov/pubmed/35358292 http://dx.doi.org/10.1371/journal.pone.0266369 |
_version_ | 1784679469100826624 |
---|---|
author | Crase, Simon Thennadil, Suresh N. |
author_facet | Crase, Simon Thennadil, Suresh N. |
author_sort | Crase, Simon |
collection | PubMed |
description | Cluster analysis is a valuable unsupervised machine learning technique that is applied in a multitude of domains to identify similarities or clusters in unlabelled data. However, its performance is dependent of the characteristics of the data it is being applied to. There is no universally best clustering algorithm, and hence, there are numerous clustering algorithms available with different performance characteristics. This raises the problem of how to select an appropriate clustering algorithm for the given analytical purposes. We present and validate an analysis framework to address this problem. Unlike most current literature which focuses on characterizing the clustering algorithm itself, we present a wider holistic approach, with a focus on the user’s needs, the data’s characteristics and the characteristics of the clusters it may contain. In our analysis framework, we utilize a softer qualitative approach to identify appropriate characteristics for consideration when matching clustering algorithms to the intended application. These are used to generate a small subset of suitable clustering algorithms whose performance are then evaluated utilizing quantitative cluster validity indices. To validate our analysis framework for selecting clustering algorithms, we applied it to four different types of datasets: three datasets of homemade explosives spectroscopy, eight datasets of publicly available spectroscopy data covering food and biomedical applications, a gene expression cancer dataset, and three classic machine learning datasets. Each data type has discernible differences in the composition of the data and the context within which they are used. Our analysis framework, when applied to each of these challenges, recommended differing subsets of clustering algorithms for final quantitative performance evaluation. For each application, the recommended clustering algorithms were confirmed to contain the top performing algorithms through quantitative performance indices. |
format | Online Article Text |
id | pubmed-8970496 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-89704962022-04-01 An analysis framework for clustering algorithm selection with applications to spectroscopy Crase, Simon Thennadil, Suresh N. PLoS One Research Article Cluster analysis is a valuable unsupervised machine learning technique that is applied in a multitude of domains to identify similarities or clusters in unlabelled data. However, its performance is dependent of the characteristics of the data it is being applied to. There is no universally best clustering algorithm, and hence, there are numerous clustering algorithms available with different performance characteristics. This raises the problem of how to select an appropriate clustering algorithm for the given analytical purposes. We present and validate an analysis framework to address this problem. Unlike most current literature which focuses on characterizing the clustering algorithm itself, we present a wider holistic approach, with a focus on the user’s needs, the data’s characteristics and the characteristics of the clusters it may contain. In our analysis framework, we utilize a softer qualitative approach to identify appropriate characteristics for consideration when matching clustering algorithms to the intended application. These are used to generate a small subset of suitable clustering algorithms whose performance are then evaluated utilizing quantitative cluster validity indices. To validate our analysis framework for selecting clustering algorithms, we applied it to four different types of datasets: three datasets of homemade explosives spectroscopy, eight datasets of publicly available spectroscopy data covering food and biomedical applications, a gene expression cancer dataset, and three classic machine learning datasets. Each data type has discernible differences in the composition of the data and the context within which they are used. Our analysis framework, when applied to each of these challenges, recommended differing subsets of clustering algorithms for final quantitative performance evaluation. For each application, the recommended clustering algorithms were confirmed to contain the top performing algorithms through quantitative performance indices. Public Library of Science 2022-03-31 /pmc/articles/PMC8970496/ /pubmed/35358292 http://dx.doi.org/10.1371/journal.pone.0266369 Text en © 2022 Crase, Thennadil https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Crase, Simon Thennadil, Suresh N. An analysis framework for clustering algorithm selection with applications to spectroscopy |
title | An analysis framework for clustering algorithm selection with applications to spectroscopy |
title_full | An analysis framework for clustering algorithm selection with applications to spectroscopy |
title_fullStr | An analysis framework for clustering algorithm selection with applications to spectroscopy |
title_full_unstemmed | An analysis framework for clustering algorithm selection with applications to spectroscopy |
title_short | An analysis framework for clustering algorithm selection with applications to spectroscopy |
title_sort | analysis framework for clustering algorithm selection with applications to spectroscopy |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8970496/ https://www.ncbi.nlm.nih.gov/pubmed/35358292 http://dx.doi.org/10.1371/journal.pone.0266369 |
work_keys_str_mv | AT crasesimon ananalysisframeworkforclusteringalgorithmselectionwithapplicationstospectroscopy AT thennadilsureshn ananalysisframeworkforclusteringalgorithmselectionwithapplicationstospectroscopy AT crasesimon analysisframeworkforclusteringalgorithmselectionwithapplicationstospectroscopy AT thennadilsureshn analysisframeworkforclusteringalgorithmselectionwithapplicationstospectroscopy |