Cargando…

Fast algorithms for approximate circular string matching

BACKGROUND: Circular string matching is a problem which naturally arises in many biological contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal average-case algorithms for exact circular string matching. Approximate ci...

Descripción completa

Detalles Bibliográficos
Autores principales: Barton, Carl, Iliopoulos, Costas S, Pissis, Solon P
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4234210/
https://www.ncbi.nlm.nih.gov/pubmed/24656145
http://dx.doi.org/10.1186/1748-7188-9-9
_version_ 1782344811322277888
author Barton, Carl
Iliopoulos, Costas S
Pissis, Solon P
author_facet Barton, Carl
Iliopoulos, Costas S
Pissis, Solon P
author_sort Barton, Carl
collection PubMed
description BACKGROUND: Circular string matching is a problem which naturally arises in many biological contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal average-case algorithms for exact circular string matching. Approximate circular string matching is a rather undeveloped area. RESULTS: In this article, we present a suboptimal average-case algorithm for exact circular string matching requiring time [Formula: see text] . Based on our solution for the exact case, we present two fast average-case algorithms for approximate circular string matching with k-mismatches, under the Hamming distance model, requiring time [Formula: see text] for moderate values of k, that is [Formula: see text] . We show how the same results can be easily obtained under the edit distance model. The presented algorithms are also implemented as library functions. Experimental results demonstrate that the functions provided in this library accelerate the computations by more than three orders of magnitude compared to a naïve approach. CONCLUSIONS: We present two fast average-case algorithms for approximate circular string matching with k-mismatches; and show that they also perform very well in practice. The importance of our contribution is underlined by the fact that the provided functions may be seamlessly integrated into any biological pipeline. The source code of the library is freely available at http://www.inf.kcl.ac.uk/research/projects/asmf/.
format Online
Article
Text
id pubmed-4234210
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42342102014-11-18 Fast algorithms for approximate circular string matching Barton, Carl Iliopoulos, Costas S Pissis, Solon P Algorithms Mol Biol Research BACKGROUND: Circular string matching is a problem which naturally arises in many biological contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal average-case algorithms for exact circular string matching. Approximate circular string matching is a rather undeveloped area. RESULTS: In this article, we present a suboptimal average-case algorithm for exact circular string matching requiring time [Formula: see text] . Based on our solution for the exact case, we present two fast average-case algorithms for approximate circular string matching with k-mismatches, under the Hamming distance model, requiring time [Formula: see text] for moderate values of k, that is [Formula: see text] . We show how the same results can be easily obtained under the edit distance model. The presented algorithms are also implemented as library functions. Experimental results demonstrate that the functions provided in this library accelerate the computations by more than three orders of magnitude compared to a naïve approach. CONCLUSIONS: We present two fast average-case algorithms for approximate circular string matching with k-mismatches; and show that they also perform very well in practice. The importance of our contribution is underlined by the fact that the provided functions may be seamlessly integrated into any biological pipeline. The source code of the library is freely available at http://www.inf.kcl.ac.uk/research/projects/asmf/. BioMed Central 2014-03-22 /pmc/articles/PMC4234210/ /pubmed/24656145 http://dx.doi.org/10.1186/1748-7188-9-9 Text en Copyright © 2014 Barton et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Barton, Carl
Iliopoulos, Costas S
Pissis, Solon P
Fast algorithms for approximate circular string matching
title Fast algorithms for approximate circular string matching
title_full Fast algorithms for approximate circular string matching
title_fullStr Fast algorithms for approximate circular string matching
title_full_unstemmed Fast algorithms for approximate circular string matching
title_short Fast algorithms for approximate circular string matching
title_sort fast algorithms for approximate circular string matching
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4234210/
https://www.ncbi.nlm.nih.gov/pubmed/24656145
http://dx.doi.org/10.1186/1748-7188-9-9
work_keys_str_mv AT bartoncarl fastalgorithmsforapproximatecircularstringmatching
AT iliopouloscostass fastalgorithmsforapproximatecircularstringmatching
AT pississolonp fastalgorithmsforapproximatecircularstringmatching