Cargando…
HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree
Hierarchical clustering (HC) is one of the most frequently used methods in computational biology in the analysis of high-dimensional genomics data. Given a data set, HC outputs a binary tree leaves of which are the data points and internal nodes represent clusters of various sizes. Normally, a fixed...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Libertas Academica
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4372030/ https://www.ncbi.nlm.nih.gov/pubmed/25861213 http://dx.doi.org/10.4137/CIN.S22080 |
_version_ | 1782363117682950144 |
---|---|
author | Obulkasim, Askar van de Wiel, Mark A |
author_facet | Obulkasim, Askar van de Wiel, Mark A |
author_sort | Obulkasim, Askar |
collection | PubMed |
description | Hierarchical clustering (HC) is one of the most frequently used methods in computational biology in the analysis of high-dimensional genomics data. Given a data set, HC outputs a binary tree leaves of which are the data points and internal nodes represent clusters of various sizes. Normally, a fixed-height cut on the HC tree is chosen, and each contiguous branch of data points below that height is considered as a separate cluster. However, the fixed-height branch cut may not be ideal in situations where one expects a complicated tree structure with nested clusters. Furthermore, due to lack of utilization of related background information in selecting the cutoff, induced clusters are often difficult to interpret. This paper describes a novel procedure that aims to automatically extract meaningful clusters from the HC tree in a semi-supervised way. The procedure is implemented in the R package HCsnip available from Bioconductor. Rather than cutting the HC tree at a fixed-height, HCsnip probes the various way of snipping, possibly at variable heights, to tease out hidden clusters ensconced deep down in the tree. The cluster extraction process utilizes, along with the data set from which the HC tree is derived, commonly available background information. Consequently, the extracted clusters are highly reproducible and robust against various sources of variations that “haunted” high-dimensional genomics data. Since the clustering process is guided by the background information, clusters are easy to interpret. Unlike existing packages, no constraint is placed on the data type on which clustering is desired. Particularly, the package accepts patient follow-up data for guiding the cluster extraction process. To our knowledge, HCsnip is the first package that is able to decomposes the HC tree into clusters with piecewise snipping under the guidance of patient time-to-event information. Our implementation of the semi-supervised HC tree snipping framework is generic, and can be combined with other algorithms that operate on detected clusters. |
format | Online Article Text |
id | pubmed-4372030 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Libertas Academica |
record_format | MEDLINE/PubMed |
spelling | pubmed-43720302015-04-08 HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree Obulkasim, Askar van de Wiel, Mark A Cancer Inform Software or Database Review Hierarchical clustering (HC) is one of the most frequently used methods in computational biology in the analysis of high-dimensional genomics data. Given a data set, HC outputs a binary tree leaves of which are the data points and internal nodes represent clusters of various sizes. Normally, a fixed-height cut on the HC tree is chosen, and each contiguous branch of data points below that height is considered as a separate cluster. However, the fixed-height branch cut may not be ideal in situations where one expects a complicated tree structure with nested clusters. Furthermore, due to lack of utilization of related background information in selecting the cutoff, induced clusters are often difficult to interpret. This paper describes a novel procedure that aims to automatically extract meaningful clusters from the HC tree in a semi-supervised way. The procedure is implemented in the R package HCsnip available from Bioconductor. Rather than cutting the HC tree at a fixed-height, HCsnip probes the various way of snipping, possibly at variable heights, to tease out hidden clusters ensconced deep down in the tree. The cluster extraction process utilizes, along with the data set from which the HC tree is derived, commonly available background information. Consequently, the extracted clusters are highly reproducible and robust against various sources of variations that “haunted” high-dimensional genomics data. Since the clustering process is guided by the background information, clusters are easy to interpret. Unlike existing packages, no constraint is placed on the data type on which clustering is desired. Particularly, the package accepts patient follow-up data for guiding the cluster extraction process. To our knowledge, HCsnip is the first package that is able to decomposes the HC tree into clusters with piecewise snipping under the guidance of patient time-to-event information. Our implementation of the semi-supervised HC tree snipping framework is generic, and can be combined with other algorithms that operate on detected clusters. Libertas Academica 2015-03-22 /pmc/articles/PMC4372030/ /pubmed/25861213 http://dx.doi.org/10.4137/CIN.S22080 Text en © 2015 the author(s), publisher and licensee Libertas Academica Ltd. This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License. |
spellingShingle | Software or Database Review Obulkasim, Askar van de Wiel, Mark A HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree |
title | HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree |
title_full | HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree |
title_fullStr | HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree |
title_full_unstemmed | HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree |
title_short | HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree |
title_sort | hcsnip: an r package for semi-supervised snipping of the hierarchical clustering tree |
topic | Software or Database Review |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4372030/ https://www.ncbi.nlm.nih.gov/pubmed/25861213 http://dx.doi.org/10.4137/CIN.S22080 |
work_keys_str_mv | AT obulkasimaskar hcsnipanrpackageforsemisupervisedsnippingofthehierarchicalclusteringtree AT vandewielmarka hcsnipanrpackageforsemisupervisedsnippingofthehierarchicalclusteringtree |