Cargando…
Semi-supervised adaptive-height snipping of the hierarchical clustering tree
BACKGROUND: In genomics, hierarchical clustering (HC) is a popular method for grouping similar samples based on a distance measure. HC algorithms do not actually create clusters, but compute a hierarchical representation of the data set. Usually, a fixed height on the HC tree is used, and each conti...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4302100/ https://www.ncbi.nlm.nih.gov/pubmed/25592847 http://dx.doi.org/10.1186/s12859-014-0448-1 |
_version_ | 1782353737072771072 |
---|---|
author | Obulkasim, Askar Meijer, Gerrit A van de Wiel, Mark A |
author_facet | Obulkasim, Askar Meijer, Gerrit A van de Wiel, Mark A |
author_sort | Obulkasim, Askar |
collection | PubMed |
description | BACKGROUND: In genomics, hierarchical clustering (HC) is a popular method for grouping similar samples based on a distance measure. HC algorithms do not actually create clusters, but compute a hierarchical representation of the data set. Usually, a fixed height on the HC tree is used, and each contiguous branch of samples below that height is considered a separate cluster. Due to the fixed-height cutting, those clusters may not unravel significant functional coherence hidden deeper in the tree. Besides that, most existing approaches do not make use of available clinical information to guide cluster extraction from the HC. Thus, the identified subgroups may be difficult to interpret in relation to that information. RESULTS: We develop a novel framework for decomposing the HC tree into clusters by semi-supervised piecewise snipping. The framework, called guided piecewise snipping, utilizes both molecular data and clinical information to decompose the HC tree into clusters. It cuts the given HC tree at variable heights to find a partition (a set of non-overlapping clusters) which does not only represent a structure deemed to underlie the data from which HC tree is derived, but is also maximally consistent with the supplied clinical data. Moreover, the approach does not require the user to specify the number of clusters prior to the analysis. Extensive results on simulated and multiple medical data sets show that our approach consistently produces more meaningful clusters than the standard fixed-height cut and/or non-guided approaches. CONCLUSIONS: The guided piecewise snipping approach features several novelties and advantages over existing approaches. The proposed algorithm is generic, and can be combined with other algorithms that operate on detected clusters. This approach represents an advancement in several regards: (1) a piecewise tree snipping framework that efficiently extracts clusters by snipping the HC tree possibly at variable heights while preserving the HC tree structure; (2) a flexible implementation allowing a variety of data types for both building and snipping the HC tree, including patient follow-up data like survival as auxiliary information. The data sets and R code are provided as supplementary files. The proposed method is available from Bioconductor as the R-package HCsnip. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0448-1) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4302100 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43021002015-01-22 Semi-supervised adaptive-height snipping of the hierarchical clustering tree Obulkasim, Askar Meijer, Gerrit A van de Wiel, Mark A BMC Bioinformatics Research Article BACKGROUND: In genomics, hierarchical clustering (HC) is a popular method for grouping similar samples based on a distance measure. HC algorithms do not actually create clusters, but compute a hierarchical representation of the data set. Usually, a fixed height on the HC tree is used, and each contiguous branch of samples below that height is considered a separate cluster. Due to the fixed-height cutting, those clusters may not unravel significant functional coherence hidden deeper in the tree. Besides that, most existing approaches do not make use of available clinical information to guide cluster extraction from the HC. Thus, the identified subgroups may be difficult to interpret in relation to that information. RESULTS: We develop a novel framework for decomposing the HC tree into clusters by semi-supervised piecewise snipping. The framework, called guided piecewise snipping, utilizes both molecular data and clinical information to decompose the HC tree into clusters. It cuts the given HC tree at variable heights to find a partition (a set of non-overlapping clusters) which does not only represent a structure deemed to underlie the data from which HC tree is derived, but is also maximally consistent with the supplied clinical data. Moreover, the approach does not require the user to specify the number of clusters prior to the analysis. Extensive results on simulated and multiple medical data sets show that our approach consistently produces more meaningful clusters than the standard fixed-height cut and/or non-guided approaches. CONCLUSIONS: The guided piecewise snipping approach features several novelties and advantages over existing approaches. The proposed algorithm is generic, and can be combined with other algorithms that operate on detected clusters. This approach represents an advancement in several regards: (1) a piecewise tree snipping framework that efficiently extracts clusters by snipping the HC tree possibly at variable heights while preserving the HC tree structure; (2) a flexible implementation allowing a variety of data types for both building and snipping the HC tree, including patient follow-up data like survival as auxiliary information. The data sets and R code are provided as supplementary files. The proposed method is available from Bioconductor as the R-package HCsnip. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0448-1) contains supplementary material, which is available to authorized users. BioMed Central 2015-01-16 /pmc/articles/PMC4302100/ /pubmed/25592847 http://dx.doi.org/10.1186/s12859-014-0448-1 Text en © Obulkasim et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Obulkasim, Askar Meijer, Gerrit A van de Wiel, Mark A Semi-supervised adaptive-height snipping of the hierarchical clustering tree |
title | Semi-supervised adaptive-height snipping of the hierarchical clustering tree |
title_full | Semi-supervised adaptive-height snipping of the hierarchical clustering tree |
title_fullStr | Semi-supervised adaptive-height snipping of the hierarchical clustering tree |
title_full_unstemmed | Semi-supervised adaptive-height snipping of the hierarchical clustering tree |
title_short | Semi-supervised adaptive-height snipping of the hierarchical clustering tree |
title_sort | semi-supervised adaptive-height snipping of the hierarchical clustering tree |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4302100/ https://www.ncbi.nlm.nih.gov/pubmed/25592847 http://dx.doi.org/10.1186/s12859-014-0448-1 |
work_keys_str_mv | AT obulkasimaskar semisupervisedadaptiveheightsnippingofthehierarchicalclusteringtree AT meijergerrita semisupervisedadaptiveheightsnippingofthehierarchicalclusteringtree AT vandewielmarka semisupervisedadaptiveheightsnippingofthehierarchicalclusteringtree |