Cargando…

Dynamic partitioning of search patterns for approximate pattern matching using search schemes

Search schemes constitute a flexible and generic framework to describe how all approximate occurrences of a search pattern in a text can be found efficiently. We propose an algorithm for the dynamic partitioning of search patterns which can be universally applied to any kind of search scheme and dem...

Descripción completa

Detalles Bibliográficos
Autores principales: Renders, Luca, Marchal, Kathleen, Fostier, Jan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8246400/
https://www.ncbi.nlm.nih.gov/pubmed/34235407
http://dx.doi.org/10.1016/j.isci.2021.102687
_version_ 1783716306711740416
author Renders, Luca
Marchal, Kathleen
Fostier, Jan
author_facet Renders, Luca
Marchal, Kathleen
Fostier, Jan
author_sort Renders, Luca
collection PubMed
description Search schemes constitute a flexible and generic framework to describe how all approximate occurrences of a search pattern in a text can be found efficiently. We propose an algorithm for the dynamic partitioning of search patterns which can be universally applied to any kind of search scheme and demonstrate that this technique significantly reduces the search space. We present Columba, a software tool written in C++, in which a multitude of search schemes are implemented. We discuss implementation aspects such as memory interleaving of Burrows-Wheeler transform representations and the reduction of redundancy that is inherently associated with the edit distance metric. Ultimately, we demonstrate that Columba has superior performance to the state of the art. Using a single CPU core, Columba is able to retrieve all occurrences of 100,000 Illumina reads and their reverse complements within a maximum edit distance of four in the human genome in less than 3 min.
format Online
Article
Text
id pubmed-8246400
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-82464002021-07-06 Dynamic partitioning of search patterns for approximate pattern matching using search schemes Renders, Luca Marchal, Kathleen Fostier, Jan iScience Article Search schemes constitute a flexible and generic framework to describe how all approximate occurrences of a search pattern in a text can be found efficiently. We propose an algorithm for the dynamic partitioning of search patterns which can be universally applied to any kind of search scheme and demonstrate that this technique significantly reduces the search space. We present Columba, a software tool written in C++, in which a multitude of search schemes are implemented. We discuss implementation aspects such as memory interleaving of Burrows-Wheeler transform representations and the reduction of redundancy that is inherently associated with the edit distance metric. Ultimately, we demonstrate that Columba has superior performance to the state of the art. Using a single CPU core, Columba is able to retrieve all occurrences of 100,000 Illumina reads and their reverse complements within a maximum edit distance of four in the human genome in less than 3 min. Elsevier 2021-06-10 /pmc/articles/PMC8246400/ /pubmed/34235407 http://dx.doi.org/10.1016/j.isci.2021.102687 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Renders, Luca
Marchal, Kathleen
Fostier, Jan
Dynamic partitioning of search patterns for approximate pattern matching using search schemes
title Dynamic partitioning of search patterns for approximate pattern matching using search schemes
title_full Dynamic partitioning of search patterns for approximate pattern matching using search schemes
title_fullStr Dynamic partitioning of search patterns for approximate pattern matching using search schemes
title_full_unstemmed Dynamic partitioning of search patterns for approximate pattern matching using search schemes
title_short Dynamic partitioning of search patterns for approximate pattern matching using search schemes
title_sort dynamic partitioning of search patterns for approximate pattern matching using search schemes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8246400/
https://www.ncbi.nlm.nih.gov/pubmed/34235407
http://dx.doi.org/10.1016/j.isci.2021.102687
work_keys_str_mv AT rendersluca dynamicpartitioningofsearchpatternsforapproximatepatternmatchingusingsearchschemes
AT marchalkathleen dynamicpartitioningofsearchpatternsforapproximatepatternmatchingusingsearchschemes
AT fostierjan dynamicpartitioningofsearchpatternsforapproximatepatternmatchingusingsearchschemes