Cargando…

A generalized Robinson-Foulds distance for labeled trees

BACKGROUND: The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial as...

Descripción completa

Detalles Bibliográficos
Autores principales: Briand, Samuel, Dessimoz, Christophe, El-Mabrouk, Nadia, Lafond, Manuel, Lobinska, Gabriela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7677779/
https://www.ncbi.nlm.nih.gov/pubmed/33208096
http://dx.doi.org/10.1186/s12864-020-07011-0
_version_ 1783612046915403776
author Briand, Samuel
Dessimoz, Christophe
El-Mabrouk, Nadia
Lafond, Manuel
Lobinska, Gabriela
author_facet Briand, Samuel
Dessimoz, Christophe
El-Mabrouk, Nadia
Lafond, Manuel
Lobinska, Gabriela
author_sort Briand, Samuel
collection PubMed
description BACKGROUND: The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the type of the branching event (e.g. speciation, duplication, transfer, etc). RESULTS: We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting “good” edges, i.e. edges shared between the two trees. CONCLUSIONS: We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead, computing distances between labeled trees opens up a variety of new algorithmic directions.Implementation and simulations available at https://github.com/DessimozLab/pylabeledrf.
format Online
Article
Text
id pubmed-7677779
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-76777792020-11-20 A generalized Robinson-Foulds distance for labeled trees Briand, Samuel Dessimoz, Christophe El-Mabrouk, Nadia Lafond, Manuel Lobinska, Gabriela BMC Genomics Research BACKGROUND: The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the type of the branching event (e.g. speciation, duplication, transfer, etc). RESULTS: We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting “good” edges, i.e. edges shared between the two trees. CONCLUSIONS: We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead, computing distances between labeled trees opens up a variety of new algorithmic directions.Implementation and simulations available at https://github.com/DessimozLab/pylabeledrf. BioMed Central 2020-11-18 /pmc/articles/PMC7677779/ /pubmed/33208096 http://dx.doi.org/10.1186/s12864-020-07011-0 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Briand, Samuel
Dessimoz, Christophe
El-Mabrouk, Nadia
Lafond, Manuel
Lobinska, Gabriela
A generalized Robinson-Foulds distance for labeled trees
title A generalized Robinson-Foulds distance for labeled trees
title_full A generalized Robinson-Foulds distance for labeled trees
title_fullStr A generalized Robinson-Foulds distance for labeled trees
title_full_unstemmed A generalized Robinson-Foulds distance for labeled trees
title_short A generalized Robinson-Foulds distance for labeled trees
title_sort generalized robinson-foulds distance for labeled trees
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7677779/
https://www.ncbi.nlm.nih.gov/pubmed/33208096
http://dx.doi.org/10.1186/s12864-020-07011-0
work_keys_str_mv AT briandsamuel ageneralizedrobinsonfouldsdistanceforlabeledtrees
AT dessimozchristophe ageneralizedrobinsonfouldsdistanceforlabeledtrees
AT elmabrouknadia ageneralizedrobinsonfouldsdistanceforlabeledtrees
AT lafondmanuel ageneralizedrobinsonfouldsdistanceforlabeledtrees
AT lobinskagabriela ageneralizedrobinsonfouldsdistanceforlabeledtrees
AT briandsamuel generalizedrobinsonfouldsdistanceforlabeledtrees
AT dessimozchristophe generalizedrobinsonfouldsdistanceforlabeledtrees
AT elmabrouknadia generalizedrobinsonfouldsdistanceforlabeledtrees
AT lafondmanuel generalizedrobinsonfouldsdistanceforlabeledtrees
AT lobinskagabriela generalizedrobinsonfouldsdistanceforlabeledtrees