Cargando…
Tree edit distance for leaf-labelled trees on free leafset and its comparison with frequent subsplit dissimilarity and popular distance measures
BACKGROUND: This paper is devoted to distance measures for leaf-labelled trees on free leafset. A leaf-labelled tree is a data structure which is a special type of a tree where only leaves (terminal) nodes are labelled. This data structure is used in bioinformatics for modelling of evolution history...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117735/ https://www.ncbi.nlm.nih.gov/pubmed/21612645 http://dx.doi.org/10.1186/1471-2105-12-204 |
_version_ | 1782206363165786112 |
---|---|
author | Koperwas, Jakub Walczak, Krzysztof |
author_facet | Koperwas, Jakub Walczak, Krzysztof |
author_sort | Koperwas, Jakub |
collection | PubMed |
description | BACKGROUND: This paper is devoted to distance measures for leaf-labelled trees on free leafset. A leaf-labelled tree is a data structure which is a special type of a tree where only leaves (terminal) nodes are labelled. This data structure is used in bioinformatics for modelling of evolution history of genes and species and also in linguistics for modelling of languages evolution history. Many domain specific problems occur and need to be solved with help of tree postprocessing techniques such as distance measures. RESULTS: Here we introduce the tree edit distance designed for leaf labelled trees on free leafset, which occurs to be a metric. It is presented together with tree edit consensus tree notion. We provide statistical evaluation of provided measure with respect to R-F, MAST and frequent subsplit based dissimilarity measures as the reference measures. CONCLUSIONS: The tree edit distance was proven to be a metric and has the advantage of using different costs for contraction and pruning, therefore their properties can be tuned depending on the needs of the user. Two of the presented methods carry the most interesting properties. E(3,1) is very discriminative (having a wide range of values) and has a very regular distance distribution which is similar to a normal distribution in its shape and is good both for similar and non-similar trees. NFC(2,1) on the other hand is proportional or nearly proportional to the number of mutation operations used, irrespective of their type. |
format | Online Article Text |
id | pubmed-3117735 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-31177352011-06-18 Tree edit distance for leaf-labelled trees on free leafset and its comparison with frequent subsplit dissimilarity and popular distance measures Koperwas, Jakub Walczak, Krzysztof BMC Bioinformatics Methodology Article BACKGROUND: This paper is devoted to distance measures for leaf-labelled trees on free leafset. A leaf-labelled tree is a data structure which is a special type of a tree where only leaves (terminal) nodes are labelled. This data structure is used in bioinformatics for modelling of evolution history of genes and species and also in linguistics for modelling of languages evolution history. Many domain specific problems occur and need to be solved with help of tree postprocessing techniques such as distance measures. RESULTS: Here we introduce the tree edit distance designed for leaf labelled trees on free leafset, which occurs to be a metric. It is presented together with tree edit consensus tree notion. We provide statistical evaluation of provided measure with respect to R-F, MAST and frequent subsplit based dissimilarity measures as the reference measures. CONCLUSIONS: The tree edit distance was proven to be a metric and has the advantage of using different costs for contraction and pruning, therefore their properties can be tuned depending on the needs of the user. Two of the presented methods carry the most interesting properties. E(3,1) is very discriminative (having a wide range of values) and has a very regular distance distribution which is similar to a normal distribution in its shape and is good both for similar and non-similar trees. NFC(2,1) on the other hand is proportional or nearly proportional to the number of mutation operations used, irrespective of their type. BioMed Central 2011-05-25 /pmc/articles/PMC3117735/ /pubmed/21612645 http://dx.doi.org/10.1186/1471-2105-12-204 Text en Copyright ©2011 Koperwas and Walczak; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Koperwas, Jakub Walczak, Krzysztof Tree edit distance for leaf-labelled trees on free leafset and its comparison with frequent subsplit dissimilarity and popular distance measures |
title | Tree edit distance for leaf-labelled trees on free leafset and its comparison with frequent subsplit dissimilarity and popular distance measures |
title_full | Tree edit distance for leaf-labelled trees on free leafset and its comparison with frequent subsplit dissimilarity and popular distance measures |
title_fullStr | Tree edit distance for leaf-labelled trees on free leafset and its comparison with frequent subsplit dissimilarity and popular distance measures |
title_full_unstemmed | Tree edit distance for leaf-labelled trees on free leafset and its comparison with frequent subsplit dissimilarity and popular distance measures |
title_short | Tree edit distance for leaf-labelled trees on free leafset and its comparison with frequent subsplit dissimilarity and popular distance measures |
title_sort | tree edit distance for leaf-labelled trees on free leafset and its comparison with frequent subsplit dissimilarity and popular distance measures |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117735/ https://www.ncbi.nlm.nih.gov/pubmed/21612645 http://dx.doi.org/10.1186/1471-2105-12-204 |
work_keys_str_mv | AT koperwasjakub treeeditdistanceforleaflabelledtreesonfreeleafsetanditscomparisonwithfrequentsubsplitdissimilarityandpopulardistancemeasures AT walczakkrzysztof treeeditdistanceforleaflabelledtreesonfreeleafsetanditscomparisonwithfrequentsubsplitdissimilarityandpopulardistancemeasures |