Cargando…

Convex Clustering: An Attractive Alternative to Hierarchical Clustering

The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominan...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Gary K., Chi, Eric C., Ranola, John Michael O., Lange, Kenneth
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4429070/
https://www.ncbi.nlm.nih.gov/pubmed/25965340
http://dx.doi.org/10.1371/journal.pcbi.1004228
_version_ 1782370975596150784
author Chen, Gary K.
Chi, Eric C.
Ranola, John Michael O.
Lange, Kenneth
author_facet Chen, Gary K.
Chi, Eric C.
Ranola, John Michael O.
Lange, Kenneth
author_sort Chen, Gary K.
collection PubMed
description The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/
format Online
Article
Text
id pubmed-4429070
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44290702015-05-21 Convex Clustering: An Attractive Alternative to Hierarchical Clustering Chen, Gary K. Chi, Eric C. Ranola, John Michael O. Lange, Kenneth PLoS Comput Biol Research Article The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ Public Library of Science 2015-05-12 /pmc/articles/PMC4429070/ /pubmed/25965340 http://dx.doi.org/10.1371/journal.pcbi.1004228 Text en © 2015 Chen et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Chen, Gary K.
Chi, Eric C.
Ranola, John Michael O.
Lange, Kenneth
Convex Clustering: An Attractive Alternative to Hierarchical Clustering
title Convex Clustering: An Attractive Alternative to Hierarchical Clustering
title_full Convex Clustering: An Attractive Alternative to Hierarchical Clustering
title_fullStr Convex Clustering: An Attractive Alternative to Hierarchical Clustering
title_full_unstemmed Convex Clustering: An Attractive Alternative to Hierarchical Clustering
title_short Convex Clustering: An Attractive Alternative to Hierarchical Clustering
title_sort convex clustering: an attractive alternative to hierarchical clustering
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4429070/
https://www.ncbi.nlm.nih.gov/pubmed/25965340
http://dx.doi.org/10.1371/journal.pcbi.1004228
work_keys_str_mv AT chengaryk convexclusteringanattractivealternativetohierarchicalclustering
AT chiericc convexclusteringanattractivealternativetohierarchicalclustering
AT ranolajohnmichaelo convexclusteringanattractivealternativetohierarchicalclustering
AT langekenneth convexclusteringanattractivealternativetohierarchicalclustering