Cargando…

libFLASM: a software library for fixed-length approximate string matching

BACKGROUND: Approximate string matching is the problem of finding all factors of a given text that are at a distance at most k from a given pattern. Fixed-length approximate string matching is the problem of finding all factors of a text of length n that are at a distance at most k from any factor o...

Descripción completa

Detalles Bibliográficos
Autores principales: Ayad, Lorraine A. K., Pissis, Solon P., Retha, Ahmad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5103500/
https://www.ncbi.nlm.nih.gov/pubmed/27832739
http://dx.doi.org/10.1186/s12859-016-1320-2
_version_ 1782466605380272128
author Ayad, Lorraine A. K.
Pissis, Solon P.
Retha, Ahmad
author_facet Ayad, Lorraine A. K.
Pissis, Solon P.
Retha, Ahmad
author_sort Ayad, Lorraine A. K.
collection PubMed
description BACKGROUND: Approximate string matching is the problem of finding all factors of a given text that are at a distance at most k from a given pattern. Fixed-length approximate string matching is the problem of finding all factors of a text of length n that are at a distance at most k from any factor of length ℓ of a pattern of length m. There exist bit-vector techniques to solve the fixed-length approximate string matching problem in time [Formula: see text] and space [Formula: see text] under the edit and Hamming distance models, where w is the size of the computer word; as such these techniques are independent of the distance threshold k or the alphabet size. Fixed-length approximate string matching is a generalisation of approximate string matching and, hence, has numerous direct applications in computational molecular biology and elsewhere. RESULTS: We present and make available libFLASM, a free open-source C++ software library for solving fixed-length approximate string matching under both the edit and the Hamming distance models. Moreover we describe how fixed-length approximate string matching is applied to solve real problems by incorporating libFLASM into established applications for multiple circular sequence alignment as well as single and structured motif extraction. Specifically, we describe how it can be used to improve the accuracy of multiple circular sequence alignment in terms of the inferred likelihood-based phylogenies; and we also describe how it is used to efficiently find motifs in molecular sequences representing regulatory or functional regions. The comparison of the performance of the library to other algorithms show how it is competitive, especially with increasing distance thresholds. CONCLUSIONS: Fixed-length approximate string matching is a generalisation of the classic approximate string matching problem. We present libFLASM, a free open-source C++ software library for solving fixed-length approximate string matching. The extensive experimental results presented here suggest that other applications could benefit from using libFLASM, and thus further maintenance and development of libFLASM is desirable.
format Online
Article
Text
id pubmed-5103500
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51035002016-11-14 libFLASM: a software library for fixed-length approximate string matching Ayad, Lorraine A. K. Pissis, Solon P. Retha, Ahmad BMC Bioinformatics Software BACKGROUND: Approximate string matching is the problem of finding all factors of a given text that are at a distance at most k from a given pattern. Fixed-length approximate string matching is the problem of finding all factors of a text of length n that are at a distance at most k from any factor of length ℓ of a pattern of length m. There exist bit-vector techniques to solve the fixed-length approximate string matching problem in time [Formula: see text] and space [Formula: see text] under the edit and Hamming distance models, where w is the size of the computer word; as such these techniques are independent of the distance threshold k or the alphabet size. Fixed-length approximate string matching is a generalisation of approximate string matching and, hence, has numerous direct applications in computational molecular biology and elsewhere. RESULTS: We present and make available libFLASM, a free open-source C++ software library for solving fixed-length approximate string matching under both the edit and the Hamming distance models. Moreover we describe how fixed-length approximate string matching is applied to solve real problems by incorporating libFLASM into established applications for multiple circular sequence alignment as well as single and structured motif extraction. Specifically, we describe how it can be used to improve the accuracy of multiple circular sequence alignment in terms of the inferred likelihood-based phylogenies; and we also describe how it is used to efficiently find motifs in molecular sequences representing regulatory or functional regions. The comparison of the performance of the library to other algorithms show how it is competitive, especially with increasing distance thresholds. CONCLUSIONS: Fixed-length approximate string matching is a generalisation of the classic approximate string matching problem. We present libFLASM, a free open-source C++ software library for solving fixed-length approximate string matching. The extensive experimental results presented here suggest that other applications could benefit from using libFLASM, and thus further maintenance and development of libFLASM is desirable. BioMed Central 2016-11-10 /pmc/articles/PMC5103500/ /pubmed/27832739 http://dx.doi.org/10.1186/s12859-016-1320-2 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Ayad, Lorraine A. K.
Pissis, Solon P.
Retha, Ahmad
libFLASM: a software library for fixed-length approximate string matching
title libFLASM: a software library for fixed-length approximate string matching
title_full libFLASM: a software library for fixed-length approximate string matching
title_fullStr libFLASM: a software library for fixed-length approximate string matching
title_full_unstemmed libFLASM: a software library for fixed-length approximate string matching
title_short libFLASM: a software library for fixed-length approximate string matching
title_sort libflasm: a software library for fixed-length approximate string matching
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5103500/
https://www.ncbi.nlm.nih.gov/pubmed/27832739
http://dx.doi.org/10.1186/s12859-016-1320-2
work_keys_str_mv AT ayadlorraineak libflasmasoftwarelibraryforfixedlengthapproximatestringmatching
AT pississolonp libflasmasoftwarelibraryforfixedlengthapproximatestringmatching
AT rethaahmad libflasmasoftwarelibraryforfixedlengthapproximatestringmatching