Cargando…

Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants

The graph traversal edit distance (GTED), introduced by Ebrahimpour Boroojeny et al.~(2018), is an elegant distance measure defined as the minimum edit distance between strings reconstructed from Eulerian trails in two edge-labeled graphs. GTED can be used to infer evolutionary relationships between...

Descripción completa

Detalles Bibliográficos
Autores principales: Qiu, Yutong, Shen, Yihang, Kingsford, Carl
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cornell University 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10246088/
https://www.ncbi.nlm.nih.gov/pubmed/37292475
_version_ 1785145907517325312
author Qiu, Yutong
Shen, Yihang
Kingsford, Carl
author_facet Qiu, Yutong
Shen, Yihang
Kingsford, Carl
author_sort Qiu, Yutong
collection PubMed
description The graph traversal edit distance (GTED), introduced by Ebrahimpour Boroojeny et al.~(2018), is an elegant distance measure defined as the minimum edit distance between strings reconstructed from Eulerian trails in two edge-labeled graphs. GTED can be used to infer evolutionary relationships between species by comparing de Bruijn graphs directly without the computationally costly and error-prone process of genome assembly. Ebrahimpour Boroojeny et al.~(2018) propose two ILP formulations for GTED and claim that GTED is polynomially solvable because the linear programming relaxation of one of the ILPs always yields optimal integer solutions. The claim that GTED is polynomially solvable is contradictory to the complexity results of existing string-to-graph matching problems. We resolve this conflict in complexity results by proving that GTED is NP-complete and showing that the ILPs proposed by Ebrahimpour Boroojeny et al. do not solve GTED but instead solve for a lower bound of GTED and are not solvable in polynomial time. In addition, we provide the first two, correct ILP formulations of GTED and evaluate their empirical efficiency. These results provide solid algorithmic foundations for comparing genome graphs and point to the direction of heuristics. The source code to reproduce experimental results is available at https://github.com/Kingsford-Group/gtednewilp/.
format Online
Article
Text
id pubmed-10246088
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cornell University
record_format MEDLINE/PubMed
spelling pubmed-102460882023-11-14 Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants Qiu, Yutong Shen, Yihang Kingsford, Carl ArXiv Article The graph traversal edit distance (GTED), introduced by Ebrahimpour Boroojeny et al.~(2018), is an elegant distance measure defined as the minimum edit distance between strings reconstructed from Eulerian trails in two edge-labeled graphs. GTED can be used to infer evolutionary relationships between species by comparing de Bruijn graphs directly without the computationally costly and error-prone process of genome assembly. Ebrahimpour Boroojeny et al.~(2018) propose two ILP formulations for GTED and claim that GTED is polynomially solvable because the linear programming relaxation of one of the ILPs always yields optimal integer solutions. The claim that GTED is polynomially solvable is contradictory to the complexity results of existing string-to-graph matching problems. We resolve this conflict in complexity results by proving that GTED is NP-complete and showing that the ILPs proposed by Ebrahimpour Boroojeny et al. do not solve GTED but instead solve for a lower bound of GTED and are not solvable in polynomial time. In addition, we provide the first two, correct ILP formulations of GTED and evaluate their empirical efficiency. These results provide solid algorithmic foundations for comparing genome graphs and point to the direction of heuristics. The source code to reproduce experimental results is available at https://github.com/Kingsford-Group/gtednewilp/. Cornell University 2023-11-08 /pmc/articles/PMC10246088/ /pubmed/37292475 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Qiu, Yutong
Shen, Yihang
Kingsford, Carl
Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants
title Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants
title_full Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants
title_fullStr Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants
title_full_unstemmed Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants
title_short Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants
title_sort revisiting the complexity of and algorithms for the graph traversal edit distance and its variants
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10246088/
https://www.ncbi.nlm.nih.gov/pubmed/37292475
work_keys_str_mv AT qiuyutong revisitingthecomplexityofandalgorithmsforthegraphtraversaleditdistanceanditsvariants
AT shenyihang revisitingthecomplexityofandalgorithmsforthegraphtraversaleditdistanceanditsvariants
AT kingsfordcarl revisitingthecomplexityofandalgorithmsforthegraphtraversaleditdistanceanditsvariants