Cargando…

Heuristic algorithms for best match graph editing

BACKGROUND: Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connects a gene x with a gene y from another species (vertex color) Y whenever it is...

Descripción completa

Detalles Bibliográficos
Autores principales: Schaller, David, Geiß, Manuela, Hellmuth, Marc, Stadler, Peter F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8369769/
https://www.ncbi.nlm.nih.gov/pubmed/34404422
http://dx.doi.org/10.1186/s13015-021-00196-3
_version_ 1783739355016200192
author Schaller, David
Geiß, Manuela
Hellmuth, Marc
Stadler, Peter F.
author_facet Schaller, David
Geiß, Manuela
Hellmuth, Marc
Stadler, Peter F.
author_sort Schaller, David
collection PubMed
description BACKGROUND: Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connects a gene x with a gene y from another species (vertex color) Y whenever it is one of the phylogenetically closest relatives of x. BMGs can be approximated with the help of similarity measures between gene sequences, albeit not without errors. Empirical estimates thus will usually violate the theoretical properties of BMGs. The corresponding graph editing problem can be used to guide error correction for best match data. Since the arc set modification problems for BMGs are NP-complete, efficient heuristics are needed if BMGs are to be used for the practical analysis of biological sequence data. RESULTS: Since BMGs have a characterization in terms of consistency of a certain set of rooted triples (binary trees on three vertices) defined on the set of genes, we consider heuristics that operate on triple sets. As an alternative, we show that there is a close connection to a set partitioning problem that leads to a class of top-down recursive algorithms that are similar to Aho’s supertree algorithm and give rise to BMG editing algorithms that are consistent in the sense that they leave BMGs invariant. Extensive benchmarking shows that community detection algorithms for the partitioning steps perform best for BMG editing. CONCLUSION: Noisy BMG data can be corrected with sufficient accuracy and efficiency to make BMGs an attractive alternative to classical phylogenetic methods.
format Online
Article
Text
id pubmed-8369769
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-83697692021-08-18 Heuristic algorithms for best match graph editing Schaller, David Geiß, Manuela Hellmuth, Marc Stadler, Peter F. Algorithms Mol Biol Research BACKGROUND: Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connects a gene x with a gene y from another species (vertex color) Y whenever it is one of the phylogenetically closest relatives of x. BMGs can be approximated with the help of similarity measures between gene sequences, albeit not without errors. Empirical estimates thus will usually violate the theoretical properties of BMGs. The corresponding graph editing problem can be used to guide error correction for best match data. Since the arc set modification problems for BMGs are NP-complete, efficient heuristics are needed if BMGs are to be used for the practical analysis of biological sequence data. RESULTS: Since BMGs have a characterization in terms of consistency of a certain set of rooted triples (binary trees on three vertices) defined on the set of genes, we consider heuristics that operate on triple sets. As an alternative, we show that there is a close connection to a set partitioning problem that leads to a class of top-down recursive algorithms that are similar to Aho’s supertree algorithm and give rise to BMG editing algorithms that are consistent in the sense that they leave BMGs invariant. Extensive benchmarking shows that community detection algorithms for the partitioning steps perform best for BMG editing. CONCLUSION: Noisy BMG data can be corrected with sufficient accuracy and efficiency to make BMGs an attractive alternative to classical phylogenetic methods. BioMed Central 2021-08-17 /pmc/articles/PMC8369769/ /pubmed/34404422 http://dx.doi.org/10.1186/s13015-021-00196-3 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Schaller, David
Geiß, Manuela
Hellmuth, Marc
Stadler, Peter F.
Heuristic algorithms for best match graph editing
title Heuristic algorithms for best match graph editing
title_full Heuristic algorithms for best match graph editing
title_fullStr Heuristic algorithms for best match graph editing
title_full_unstemmed Heuristic algorithms for best match graph editing
title_short Heuristic algorithms for best match graph editing
title_sort heuristic algorithms for best match graph editing
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8369769/
https://www.ncbi.nlm.nih.gov/pubmed/34404422
http://dx.doi.org/10.1186/s13015-021-00196-3
work_keys_str_mv AT schallerdavid heuristicalgorithmsforbestmatchgraphediting
AT geißmanuela heuristicalgorithmsforbestmatchgraphediting
AT hellmuthmarc heuristicalgorithmsforbestmatchgraphediting
AT stadlerpeterf heuristicalgorithmsforbestmatchgraphediting