Cargando…
Significant distinct branches of hierarchical trees: a framework for statistical analysis and applications to biological data
BACKGROUND: One of the most common goals of hierarchical clustering is finding those branches of a tree that form quantifiably distinct data subtypes. Achieving this goal in a statistically meaningful way requires (a) a measure of distinctness of a branch and (b) a test to determine the significance...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4253613/ https://www.ncbi.nlm.nih.gov/pubmed/25409689 http://dx.doi.org/10.1186/1471-2164-15-1000 |
_version_ | 1782347265456734208 |
---|---|
author | Sun, Guoli Krasnitz, Alexander |
author_facet | Sun, Guoli Krasnitz, Alexander |
author_sort | Sun, Guoli |
collection | PubMed |
description | BACKGROUND: One of the most common goals of hierarchical clustering is finding those branches of a tree that form quantifiably distinct data subtypes. Achieving this goal in a statistically meaningful way requires (a) a measure of distinctness of a branch and (b) a test to determine the significance of the observed measure, applicable to all branches and across multiple scales of dissimilarity. RESULTS: We formulate a method termed Tree Branches Evaluated Statistically for Tightness (TBEST) for identifying significantly distinct tree branches in hierarchical clusters. For each branch of the tree a measure of distinctness, or tightness, is defined as a rational function of heights, both of the branch and of its parent. A statistical procedure is then developed to determine the significance of the observed values of tightness. We test TBEST as a tool for tree-based data partitioning by applying it to five benchmark datasets, one of them synthetic and the other four each from a different area of biology. For each dataset there is a well-defined partition of the data into classes. In all test cases TBEST performs on par with or better than the existing techniques. CONCLUSIONS: Based on our benchmark analysis, TBEST is a tool of choice for detection of significantly distinct branches in hierarchical trees grown from biological data. An R language implementation of the method is available from the Comprehensive R Archive Network: http://www.cran.r-project.org/web/packages/TBEST/index.html. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-1000) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4253613 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-42536132014-12-04 Significant distinct branches of hierarchical trees: a framework for statistical analysis and applications to biological data Sun, Guoli Krasnitz, Alexander BMC Genomics Methodology Article BACKGROUND: One of the most common goals of hierarchical clustering is finding those branches of a tree that form quantifiably distinct data subtypes. Achieving this goal in a statistically meaningful way requires (a) a measure of distinctness of a branch and (b) a test to determine the significance of the observed measure, applicable to all branches and across multiple scales of dissimilarity. RESULTS: We formulate a method termed Tree Branches Evaluated Statistically for Tightness (TBEST) for identifying significantly distinct tree branches in hierarchical clusters. For each branch of the tree a measure of distinctness, or tightness, is defined as a rational function of heights, both of the branch and of its parent. A statistical procedure is then developed to determine the significance of the observed values of tightness. We test TBEST as a tool for tree-based data partitioning by applying it to five benchmark datasets, one of them synthetic and the other four each from a different area of biology. For each dataset there is a well-defined partition of the data into classes. In all test cases TBEST performs on par with or better than the existing techniques. CONCLUSIONS: Based on our benchmark analysis, TBEST is a tool of choice for detection of significantly distinct branches in hierarchical trees grown from biological data. An R language implementation of the method is available from the Comprehensive R Archive Network: http://www.cran.r-project.org/web/packages/TBEST/index.html. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-1000) contains supplementary material, which is available to authorized users. BioMed Central 2014-11-19 /pmc/articles/PMC4253613/ /pubmed/25409689 http://dx.doi.org/10.1186/1471-2164-15-1000 Text en © Sun and Krasnitz; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Sun, Guoli Krasnitz, Alexander Significant distinct branches of hierarchical trees: a framework for statistical analysis and applications to biological data |
title | Significant distinct branches of hierarchical trees: a framework for statistical analysis and applications to biological data |
title_full | Significant distinct branches of hierarchical trees: a framework for statistical analysis and applications to biological data |
title_fullStr | Significant distinct branches of hierarchical trees: a framework for statistical analysis and applications to biological data |
title_full_unstemmed | Significant distinct branches of hierarchical trees: a framework for statistical analysis and applications to biological data |
title_short | Significant distinct branches of hierarchical trees: a framework for statistical analysis and applications to biological data |
title_sort | significant distinct branches of hierarchical trees: a framework for statistical analysis and applications to biological data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4253613/ https://www.ncbi.nlm.nih.gov/pubmed/25409689 http://dx.doi.org/10.1186/1471-2164-15-1000 |
work_keys_str_mv | AT sunguoli significantdistinctbranchesofhierarchicaltreesaframeworkforstatisticalanalysisandapplicationstobiologicaldata AT krasnitzalexander significantdistinctbranchesofhierarchicaltreesaframeworkforstatisticalanalysisandapplicationstobiologicaldata |