Cargando…
Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants
The graph traversal edit distance (GTED), introduced by Ebrahimpour Boroojeny et al.~(2018), is an elegant distance measure defined as the minimum edit distance between strings reconstructed from Eulerian trails in two edge-labeled graphs. GTED can be used to infer evolutionary relationships between...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cornell University
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10246088/ https://www.ncbi.nlm.nih.gov/pubmed/37292475 |
_version_ | 1785145907517325312 |
---|---|
author | Qiu, Yutong Shen, Yihang Kingsford, Carl |
author_facet | Qiu, Yutong Shen, Yihang Kingsford, Carl |
author_sort | Qiu, Yutong |
collection | PubMed |
description | The graph traversal edit distance (GTED), introduced by Ebrahimpour Boroojeny et al.~(2018), is an elegant distance measure defined as the minimum edit distance between strings reconstructed from Eulerian trails in two edge-labeled graphs. GTED can be used to infer evolutionary relationships between species by comparing de Bruijn graphs directly without the computationally costly and error-prone process of genome assembly. Ebrahimpour Boroojeny et al.~(2018) propose two ILP formulations for GTED and claim that GTED is polynomially solvable because the linear programming relaxation of one of the ILPs always yields optimal integer solutions. The claim that GTED is polynomially solvable is contradictory to the complexity results of existing string-to-graph matching problems. We resolve this conflict in complexity results by proving that GTED is NP-complete and showing that the ILPs proposed by Ebrahimpour Boroojeny et al. do not solve GTED but instead solve for a lower bound of GTED and are not solvable in polynomial time. In addition, we provide the first two, correct ILP formulations of GTED and evaluate their empirical efficiency. These results provide solid algorithmic foundations for comparing genome graphs and point to the direction of heuristics. The source code to reproduce experimental results is available at https://github.com/Kingsford-Group/gtednewilp/. |
format | Online Article Text |
id | pubmed-10246088 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cornell University |
record_format | MEDLINE/PubMed |
spelling | pubmed-102460882023-11-14 Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants Qiu, Yutong Shen, Yihang Kingsford, Carl ArXiv Article The graph traversal edit distance (GTED), introduced by Ebrahimpour Boroojeny et al.~(2018), is an elegant distance measure defined as the minimum edit distance between strings reconstructed from Eulerian trails in two edge-labeled graphs. GTED can be used to infer evolutionary relationships between species by comparing de Bruijn graphs directly without the computationally costly and error-prone process of genome assembly. Ebrahimpour Boroojeny et al.~(2018) propose two ILP formulations for GTED and claim that GTED is polynomially solvable because the linear programming relaxation of one of the ILPs always yields optimal integer solutions. The claim that GTED is polynomially solvable is contradictory to the complexity results of existing string-to-graph matching problems. We resolve this conflict in complexity results by proving that GTED is NP-complete and showing that the ILPs proposed by Ebrahimpour Boroojeny et al. do not solve GTED but instead solve for a lower bound of GTED and are not solvable in polynomial time. In addition, we provide the first two, correct ILP formulations of GTED and evaluate their empirical efficiency. These results provide solid algorithmic foundations for comparing genome graphs and point to the direction of heuristics. The source code to reproduce experimental results is available at https://github.com/Kingsford-Group/gtednewilp/. Cornell University 2023-11-08 /pmc/articles/PMC10246088/ /pubmed/37292475 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Qiu, Yutong Shen, Yihang Kingsford, Carl Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants |
title | Revisiting the Complexity of and Algorithms for the Graph Traversal Edit
Distance and Its Variants |
title_full | Revisiting the Complexity of and Algorithms for the Graph Traversal Edit
Distance and Its Variants |
title_fullStr | Revisiting the Complexity of and Algorithms for the Graph Traversal Edit
Distance and Its Variants |
title_full_unstemmed | Revisiting the Complexity of and Algorithms for the Graph Traversal Edit
Distance and Its Variants |
title_short | Revisiting the Complexity of and Algorithms for the Graph Traversal Edit
Distance and Its Variants |
title_sort | revisiting the complexity of and algorithms for the graph traversal edit
distance and its variants |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10246088/ https://www.ncbi.nlm.nih.gov/pubmed/37292475 |
work_keys_str_mv | AT qiuyutong revisitingthecomplexityofandalgorithmsforthegraphtraversaleditdistanceanditsvariants AT shenyihang revisitingthecomplexityofandalgorithmsforthegraphtraversaleditdistanceanditsvariants AT kingsfordcarl revisitingthecomplexityofandalgorithmsforthegraphtraversaleditdistanceanditsvariants |