Cargando…

Interactive visual exploration and refinement of cluster assignments

BACKGROUND: With ever-increasing amounts of data produced in biology research, scientists are in need of efficient data analysis methods. Cluster analysis, combined with visualization of the results, is one such method that can be used to make sense of large data volumes. At the same time, cluster a...

Descripción completa

Detalles Bibliográficos
Autores principales: Kern, Michael, Lex, Alexander, Gehlenborg, Nils, Johnson, Chris R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5596943/
https://www.ncbi.nlm.nih.gov/pubmed/28899361
http://dx.doi.org/10.1186/s12859-017-1813-7
_version_ 1783263626461708288
author Kern, Michael
Lex, Alexander
Gehlenborg, Nils
Johnson, Chris R.
author_facet Kern, Michael
Lex, Alexander
Gehlenborg, Nils
Johnson, Chris R.
author_sort Kern, Michael
collection PubMed
description BACKGROUND: With ever-increasing amounts of data produced in biology research, scientists are in need of efficient data analysis methods. Cluster analysis, combined with visualization of the results, is one such method that can be used to make sense of large data volumes. At the same time, cluster analysis is known to be imperfect and depends on the choice of algorithms, parameters, and distance measures. Most clustering algorithms don’t properly account for ambiguity in the source data, as records are often assigned to discrete clusters, even if an assignment is unclear. While there are metrics and visualization techniques that allow analysts to compare clusterings or to judge cluster quality, there is no comprehensive method that allows analysts to evaluate, compare, and refine cluster assignments based on the source data, derived scores, and contextual data. RESULTS: In this paper, we introduce a method that explicitly visualizes the quality of cluster assignments, allows comparisons of clustering results and enables analysts to manually curate and refine cluster assignments. Our methods are applicable to matrix data clustered with partitional, hierarchical, and fuzzy clustering algorithms. Furthermore, we enable analysts to explore clustering results in context of other data, for example, to observe whether a clustering of genomic data results in a meaningful differentiation in phenotypes. CONCLUSIONS: Our methods are integrated into Caleydo StratomeX, a popular, web-based, disease subtype analysis tool. We show in a usage scenario that our approach can reveal ambiguities in cluster assignments and produce improved clusterings that better differentiate genotypes and phenotypes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1813-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5596943
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55969432017-09-15 Interactive visual exploration and refinement of cluster assignments Kern, Michael Lex, Alexander Gehlenborg, Nils Johnson, Chris R. BMC Bioinformatics Software BACKGROUND: With ever-increasing amounts of data produced in biology research, scientists are in need of efficient data analysis methods. Cluster analysis, combined with visualization of the results, is one such method that can be used to make sense of large data volumes. At the same time, cluster analysis is known to be imperfect and depends on the choice of algorithms, parameters, and distance measures. Most clustering algorithms don’t properly account for ambiguity in the source data, as records are often assigned to discrete clusters, even if an assignment is unclear. While there are metrics and visualization techniques that allow analysts to compare clusterings or to judge cluster quality, there is no comprehensive method that allows analysts to evaluate, compare, and refine cluster assignments based on the source data, derived scores, and contextual data. RESULTS: In this paper, we introduce a method that explicitly visualizes the quality of cluster assignments, allows comparisons of clustering results and enables analysts to manually curate and refine cluster assignments. Our methods are applicable to matrix data clustered with partitional, hierarchical, and fuzzy clustering algorithms. Furthermore, we enable analysts to explore clustering results in context of other data, for example, to observe whether a clustering of genomic data results in a meaningful differentiation in phenotypes. CONCLUSIONS: Our methods are integrated into Caleydo StratomeX, a popular, web-based, disease subtype analysis tool. We show in a usage scenario that our approach can reveal ambiguities in cluster assignments and produce improved clusterings that better differentiate genotypes and phenotypes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1813-7) contains supplementary material, which is available to authorized users. BioMed Central 2017-09-12 /pmc/articles/PMC5596943/ /pubmed/28899361 http://dx.doi.org/10.1186/s12859-017-1813-7 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Kern, Michael
Lex, Alexander
Gehlenborg, Nils
Johnson, Chris R.
Interactive visual exploration and refinement of cluster assignments
title Interactive visual exploration and refinement of cluster assignments
title_full Interactive visual exploration and refinement of cluster assignments
title_fullStr Interactive visual exploration and refinement of cluster assignments
title_full_unstemmed Interactive visual exploration and refinement of cluster assignments
title_short Interactive visual exploration and refinement of cluster assignments
title_sort interactive visual exploration and refinement of cluster assignments
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5596943/
https://www.ncbi.nlm.nih.gov/pubmed/28899361
http://dx.doi.org/10.1186/s12859-017-1813-7
work_keys_str_mv AT kernmichael interactivevisualexplorationandrefinementofclusterassignments
AT lexalexander interactivevisualexplorationandrefinementofclusterassignments
AT gehlenborgnils interactivevisualexplorationandrefinementofclusterassignments
AT johnsonchrisr interactivevisualexplorationandrefinementofclusterassignments