Cargando…

Improved algorithms for approximate string matching (extended abstract)

BACKGROUND: The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants of the problem, including comparison of two stri...

Descripción completa

Detalles Bibliográficos
Autores principales: Papamichail, Dimitris, Papamichail, Georgios
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648743/
https://www.ncbi.nlm.nih.gov/pubmed/19208109
http://dx.doi.org/10.1186/1471-2105-10-S1-S10
_version_ 1782164977763745792
author Papamichail, Dimitris
Papamichail, Georgios
author_facet Papamichail, Dimitris
Papamichail, Georgios
author_sort Papamichail, Dimitris
collection PubMed
description BACKGROUND: The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants of the problem, including comparison of two strings, approximate pattern identification in a string or calculation of the longest common subsequence that two strings share. RESULTS: We designed an output sensitive algorithm solving the edit distance problem between two strings of lengths n and m respectively in time O((s - |n - m|)·min(m, n, s) + m + n) and linear space, where s is the edit distance between the two strings. This worst-case time bound sets the quadratic factor of the algorithm independent of the longest string length and improves existing theoretical bounds for this problem. The implementation of our algorithm also excels in practice, especially in cases where the two strings compared differ significantly in length. CONCLUSION: We have provided the design, analysis and implementation of a new algorithm for calculating the edit distance of two strings with both theoretical and practical implications. Source code of our algorithm is available online.
format Text
id pubmed-2648743
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26487432009-03-03 Improved algorithms for approximate string matching (extended abstract) Papamichail, Dimitris Papamichail, Georgios BMC Bioinformatics Research BACKGROUND: The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants of the problem, including comparison of two strings, approximate pattern identification in a string or calculation of the longest common subsequence that two strings share. RESULTS: We designed an output sensitive algorithm solving the edit distance problem between two strings of lengths n and m respectively in time O((s - |n - m|)·min(m, n, s) + m + n) and linear space, where s is the edit distance between the two strings. This worst-case time bound sets the quadratic factor of the algorithm independent of the longest string length and improves existing theoretical bounds for this problem. The implementation of our algorithm also excels in practice, especially in cases where the two strings compared differ significantly in length. CONCLUSION: We have provided the design, analysis and implementation of a new algorithm for calculating the edit distance of two strings with both theoretical and practical implications. Source code of our algorithm is available online. BioMed Central 2009-01-30 /pmc/articles/PMC2648743/ /pubmed/19208109 http://dx.doi.org/10.1186/1471-2105-10-S1-S10 Text en Copyright © 2009 Papamichail and Papamichail; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Papamichail, Dimitris
Papamichail, Georgios
Improved algorithms for approximate string matching (extended abstract)
title Improved algorithms for approximate string matching (extended abstract)
title_full Improved algorithms for approximate string matching (extended abstract)
title_fullStr Improved algorithms for approximate string matching (extended abstract)
title_full_unstemmed Improved algorithms for approximate string matching (extended abstract)
title_short Improved algorithms for approximate string matching (extended abstract)
title_sort improved algorithms for approximate string matching (extended abstract)
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648743/
https://www.ncbi.nlm.nih.gov/pubmed/19208109
http://dx.doi.org/10.1186/1471-2105-10-S1-S10
work_keys_str_mv AT papamichaildimitris improvedalgorithmsforapproximatestringmatchingextendedabstract
AT papamichailgeorgios improvedalgorithmsforapproximatestringmatchingextendedabstract