Cargando…

Combined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL): A Robust Method for Selection of Cluster Number, K

In order to discover new subsets (clusters) of a data set, researchers often use algorithms that perform unsupervised clustering, namely, the algorithmic separation of a dataset into some number of distinct clusters. Deciding whether a particular separation (or number of clusters, K) is correct is a...

Descripción completa

Detalles Bibliográficos
Autores principales: Sweeney, Timothy E., Chen, Albert C., Gevaert, Olivier
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4652212/
https://www.ncbi.nlm.nih.gov/pubmed/26581809
http://dx.doi.org/10.1038/srep16971
_version_ 1782401707311890432
author Sweeney, Timothy E.
Chen, Albert C.
Gevaert, Olivier
author_facet Sweeney, Timothy E.
Chen, Albert C.
Gevaert, Olivier
author_sort Sweeney, Timothy E.
collection PubMed
description In order to discover new subsets (clusters) of a data set, researchers often use algorithms that perform unsupervised clustering, namely, the algorithmic separation of a dataset into some number of distinct clusters. Deciding whether a particular separation (or number of clusters, K) is correct is a sort of ‘dark art’, with multiple techniques available for assessing the validity of unsupervised clustering algorithms. Here, we present a new technique for unsupervised clustering that uses multiple clustering algorithms, multiple validity metrics, and progressively bigger subsets of the data to produce an intuitive 3D map of cluster stability that can help determine the optimal number of clusters in a data set, a technique we call COmbined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL). COMMUNAL locally optimizes algorithms and validity measures for the data being used. We show its application to simulated data with a known K, and then apply this technique to several well-known cancer gene expression datasets, showing that COMMUNAL provides new insights into clustering behavior and stability in all tested cases. COMMUNAL is shown to be a useful tool for determining K in complex biological datasets, and is freely available as a package for R.
format Online
Article
Text
id pubmed-4652212
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-46522122015-11-24 Combined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL): A Robust Method for Selection of Cluster Number, K Sweeney, Timothy E. Chen, Albert C. Gevaert, Olivier Sci Rep Article In order to discover new subsets (clusters) of a data set, researchers often use algorithms that perform unsupervised clustering, namely, the algorithmic separation of a dataset into some number of distinct clusters. Deciding whether a particular separation (or number of clusters, K) is correct is a sort of ‘dark art’, with multiple techniques available for assessing the validity of unsupervised clustering algorithms. Here, we present a new technique for unsupervised clustering that uses multiple clustering algorithms, multiple validity metrics, and progressively bigger subsets of the data to produce an intuitive 3D map of cluster stability that can help determine the optimal number of clusters in a data set, a technique we call COmbined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL). COMMUNAL locally optimizes algorithms and validity measures for the data being used. We show its application to simulated data with a known K, and then apply this technique to several well-known cancer gene expression datasets, showing that COMMUNAL provides new insights into clustering behavior and stability in all tested cases. COMMUNAL is shown to be a useful tool for determining K in complex biological datasets, and is freely available as a package for R. Nature Publishing Group 2015-11-19 /pmc/articles/PMC4652212/ /pubmed/26581809 http://dx.doi.org/10.1038/srep16971 Text en Copyright © 2015, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Sweeney, Timothy E.
Chen, Albert C.
Gevaert, Olivier
Combined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL): A Robust Method for Selection of Cluster Number, K
title Combined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL): A Robust Method for Selection of Cluster Number, K
title_full Combined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL): A Robust Method for Selection of Cluster Number, K
title_fullStr Combined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL): A Robust Method for Selection of Cluster Number, K
title_full_unstemmed Combined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL): A Robust Method for Selection of Cluster Number, K
title_short Combined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL): A Robust Method for Selection of Cluster Number, K
title_sort combined mapping of multiple clustering algorithms (communal): a robust method for selection of cluster number, k
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4652212/
https://www.ncbi.nlm.nih.gov/pubmed/26581809
http://dx.doi.org/10.1038/srep16971
work_keys_str_mv AT sweeneytimothye combinedmappingofmultipleclusteringalgorithmscommunalarobustmethodforselectionofclusternumberk
AT chenalbertc combinedmappingofmultipleclusteringalgorithmscommunalarobustmethodforselectionofclusternumberk
AT gevaertolivier combinedmappingofmultipleclusteringalgorithmscommunalarobustmethodforselectionofclusternumberk