Cargando…

Evidence accumulation clustering using combinations of features

Evidence accumulation clustering (EAC) is an ensemble clustering algorithm that can cluster data for arbitrary shapes and numbers of clusters. Here, we present a variant of EAC in which we aimed to better cluster data with a large number of features, many of which may be uninformative. Our new metho...

Descripción completa

Detalles Bibliográficos
Autores principales: Wong, William, Tsuchiya, Naotsugu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7251952/
https://www.ncbi.nlm.nih.gov/pubmed/32477894
http://dx.doi.org/10.1016/j.mex.2020.100916
_version_ 1783539058885001216
author Wong, William
Tsuchiya, Naotsugu
author_facet Wong, William
Tsuchiya, Naotsugu
author_sort Wong, William
collection PubMed
description Evidence accumulation clustering (EAC) is an ensemble clustering algorithm that can cluster data for arbitrary shapes and numbers of clusters. Here, we present a variant of EAC in which we aimed to better cluster data with a large number of features, many of which may be uninformative. Our new method builds on the existing EAC algorithm by populating the clustering ensemble with clusterings based on combinations of fewer features than the original dataset at a time. Our method also calls for prewhitening the recombined data and weighting the influence of each individual clustering by an estimate of its informativeness. We provide code of an example implementation of the algorithm in Matlab and demonstrate its effectiveness compared to ordinary evidence accumulation clustering with synthetic data. • The clustering ensemble is made by clustering on subset combinations of features from the data; • The recombined data may be prewhitened; • Evidence accumulation can be improved by weighting the evidence with a goodness-of-clustering measure.
format Online
Article
Text
id pubmed-7251952
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-72519522020-05-29 Evidence accumulation clustering using combinations of features Wong, William Tsuchiya, Naotsugu MethodsX Computer Science Evidence accumulation clustering (EAC) is an ensemble clustering algorithm that can cluster data for arbitrary shapes and numbers of clusters. Here, we present a variant of EAC in which we aimed to better cluster data with a large number of features, many of which may be uninformative. Our new method builds on the existing EAC algorithm by populating the clustering ensemble with clusterings based on combinations of fewer features than the original dataset at a time. Our method also calls for prewhitening the recombined data and weighting the influence of each individual clustering by an estimate of its informativeness. We provide code of an example implementation of the algorithm in Matlab and demonstrate its effectiveness compared to ordinary evidence accumulation clustering with synthetic data. • The clustering ensemble is made by clustering on subset combinations of features from the data; • The recombined data may be prewhitened; • Evidence accumulation can be improved by weighting the evidence with a goodness-of-clustering measure. Elsevier 2020-05-14 /pmc/articles/PMC7251952/ /pubmed/32477894 http://dx.doi.org/10.1016/j.mex.2020.100916 Text en © 2020 The Author(s) http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Computer Science
Wong, William
Tsuchiya, Naotsugu
Evidence accumulation clustering using combinations of features
title Evidence accumulation clustering using combinations of features
title_full Evidence accumulation clustering using combinations of features
title_fullStr Evidence accumulation clustering using combinations of features
title_full_unstemmed Evidence accumulation clustering using combinations of features
title_short Evidence accumulation clustering using combinations of features
title_sort evidence accumulation clustering using combinations of features
topic Computer Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7251952/
https://www.ncbi.nlm.nih.gov/pubmed/32477894
http://dx.doi.org/10.1016/j.mex.2020.100916
work_keys_str_mv AT wongwilliam evidenceaccumulationclusteringusingcombinationsoffeatures
AT tsuchiyanaotsugu evidenceaccumulationclusteringusingcombinationsoffeatures