Cargando…

HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree

Hierarchical clustering (HC) is one of the most frequently used methods in computational biology in the analysis of high-dimensional genomics data. Given a data set, HC outputs a binary tree leaves of which are the data points and internal nodes represent clusters of various sizes. Normally, a fixed...

Descripción completa

Detalles Bibliográficos
Autores principales: Obulkasim, Askar, van de Wiel, Mark A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4372030/
https://www.ncbi.nlm.nih.gov/pubmed/25861213
http://dx.doi.org/10.4137/CIN.S22080
_version_ 1782363117682950144
author Obulkasim, Askar
van de Wiel, Mark A
author_facet Obulkasim, Askar
van de Wiel, Mark A
author_sort Obulkasim, Askar
collection PubMed
description Hierarchical clustering (HC) is one of the most frequently used methods in computational biology in the analysis of high-dimensional genomics data. Given a data set, HC outputs a binary tree leaves of which are the data points and internal nodes represent clusters of various sizes. Normally, a fixed-height cut on the HC tree is chosen, and each contiguous branch of data points below that height is considered as a separate cluster. However, the fixed-height branch cut may not be ideal in situations where one expects a complicated tree structure with nested clusters. Furthermore, due to lack of utilization of related background information in selecting the cutoff, induced clusters are often difficult to interpret. This paper describes a novel procedure that aims to automatically extract meaningful clusters from the HC tree in a semi-supervised way. The procedure is implemented in the R package HCsnip available from Bioconductor. Rather than cutting the HC tree at a fixed-height, HCsnip probes the various way of snipping, possibly at variable heights, to tease out hidden clusters ensconced deep down in the tree. The cluster extraction process utilizes, along with the data set from which the HC tree is derived, commonly available background information. Consequently, the extracted clusters are highly reproducible and robust against various sources of variations that “haunted” high-dimensional genomics data. Since the clustering process is guided by the background information, clusters are easy to interpret. Unlike existing packages, no constraint is placed on the data type on which clustering is desired. Particularly, the package accepts patient follow-up data for guiding the cluster extraction process. To our knowledge, HCsnip is the first package that is able to decomposes the HC tree into clusters with piecewise snipping under the guidance of patient time-to-event information. Our implementation of the semi-supervised HC tree snipping framework is generic, and can be combined with other algorithms that operate on detected clusters.
format Online
Article
Text
id pubmed-4372030
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-43720302015-04-08 HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree Obulkasim, Askar van de Wiel, Mark A Cancer Inform Software or Database Review Hierarchical clustering (HC) is one of the most frequently used methods in computational biology in the analysis of high-dimensional genomics data. Given a data set, HC outputs a binary tree leaves of which are the data points and internal nodes represent clusters of various sizes. Normally, a fixed-height cut on the HC tree is chosen, and each contiguous branch of data points below that height is considered as a separate cluster. However, the fixed-height branch cut may not be ideal in situations where one expects a complicated tree structure with nested clusters. Furthermore, due to lack of utilization of related background information in selecting the cutoff, induced clusters are often difficult to interpret. This paper describes a novel procedure that aims to automatically extract meaningful clusters from the HC tree in a semi-supervised way. The procedure is implemented in the R package HCsnip available from Bioconductor. Rather than cutting the HC tree at a fixed-height, HCsnip probes the various way of snipping, possibly at variable heights, to tease out hidden clusters ensconced deep down in the tree. The cluster extraction process utilizes, along with the data set from which the HC tree is derived, commonly available background information. Consequently, the extracted clusters are highly reproducible and robust against various sources of variations that “haunted” high-dimensional genomics data. Since the clustering process is guided by the background information, clusters are easy to interpret. Unlike existing packages, no constraint is placed on the data type on which clustering is desired. Particularly, the package accepts patient follow-up data for guiding the cluster extraction process. To our knowledge, HCsnip is the first package that is able to decomposes the HC tree into clusters with piecewise snipping under the guidance of patient time-to-event information. Our implementation of the semi-supervised HC tree snipping framework is generic, and can be combined with other algorithms that operate on detected clusters. Libertas Academica 2015-03-22 /pmc/articles/PMC4372030/ /pubmed/25861213 http://dx.doi.org/10.4137/CIN.S22080 Text en © 2015 the author(s), publisher and licensee Libertas Academica Ltd. This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License.
spellingShingle Software or Database Review
Obulkasim, Askar
van de Wiel, Mark A
HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree
title HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree
title_full HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree
title_fullStr HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree
title_full_unstemmed HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree
title_short HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree
title_sort hcsnip: an r package for semi-supervised snipping of the hierarchical clustering tree
topic Software or Database Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4372030/
https://www.ncbi.nlm.nih.gov/pubmed/25861213
http://dx.doi.org/10.4137/CIN.S22080
work_keys_str_mv AT obulkasimaskar hcsnipanrpackageforsemisupervisedsnippingofthehierarchicalclusteringtree
AT vandewielmarka hcsnipanrpackageforsemisupervisedsnippingofthehierarchicalclusteringtree