Cargando…
Dynamic partitioning of search patterns for approximate pattern matching using search schemes
Search schemes constitute a flexible and generic framework to describe how all approximate occurrences of a search pattern in a text can be found efficiently. We propose an algorithm for the dynamic partitioning of search patterns which can be universally applied to any kind of search scheme and dem...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8246400/ https://www.ncbi.nlm.nih.gov/pubmed/34235407 http://dx.doi.org/10.1016/j.isci.2021.102687 |
_version_ | 1783716306711740416 |
---|---|
author | Renders, Luca Marchal, Kathleen Fostier, Jan |
author_facet | Renders, Luca Marchal, Kathleen Fostier, Jan |
author_sort | Renders, Luca |
collection | PubMed |
description | Search schemes constitute a flexible and generic framework to describe how all approximate occurrences of a search pattern in a text can be found efficiently. We propose an algorithm for the dynamic partitioning of search patterns which can be universally applied to any kind of search scheme and demonstrate that this technique significantly reduces the search space. We present Columba, a software tool written in C++, in which a multitude of search schemes are implemented. We discuss implementation aspects such as memory interleaving of Burrows-Wheeler transform representations and the reduction of redundancy that is inherently associated with the edit distance metric. Ultimately, we demonstrate that Columba has superior performance to the state of the art. Using a single CPU core, Columba is able to retrieve all occurrences of 100,000 Illumina reads and their reverse complements within a maximum edit distance of four in the human genome in less than 3 min. |
format | Online Article Text |
id | pubmed-8246400 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-82464002021-07-06 Dynamic partitioning of search patterns for approximate pattern matching using search schemes Renders, Luca Marchal, Kathleen Fostier, Jan iScience Article Search schemes constitute a flexible and generic framework to describe how all approximate occurrences of a search pattern in a text can be found efficiently. We propose an algorithm for the dynamic partitioning of search patterns which can be universally applied to any kind of search scheme and demonstrate that this technique significantly reduces the search space. We present Columba, a software tool written in C++, in which a multitude of search schemes are implemented. We discuss implementation aspects such as memory interleaving of Burrows-Wheeler transform representations and the reduction of redundancy that is inherently associated with the edit distance metric. Ultimately, we demonstrate that Columba has superior performance to the state of the art. Using a single CPU core, Columba is able to retrieve all occurrences of 100,000 Illumina reads and their reverse complements within a maximum edit distance of four in the human genome in less than 3 min. Elsevier 2021-06-10 /pmc/articles/PMC8246400/ /pubmed/34235407 http://dx.doi.org/10.1016/j.isci.2021.102687 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Article Renders, Luca Marchal, Kathleen Fostier, Jan Dynamic partitioning of search patterns for approximate pattern matching using search schemes |
title | Dynamic partitioning of search patterns for approximate pattern matching using search schemes |
title_full | Dynamic partitioning of search patterns for approximate pattern matching using search schemes |
title_fullStr | Dynamic partitioning of search patterns for approximate pattern matching using search schemes |
title_full_unstemmed | Dynamic partitioning of search patterns for approximate pattern matching using search schemes |
title_short | Dynamic partitioning of search patterns for approximate pattern matching using search schemes |
title_sort | dynamic partitioning of search patterns for approximate pattern matching using search schemes |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8246400/ https://www.ncbi.nlm.nih.gov/pubmed/34235407 http://dx.doi.org/10.1016/j.isci.2021.102687 |
work_keys_str_mv | AT rendersluca dynamicpartitioningofsearchpatternsforapproximatepatternmatchingusingsearchschemes AT marchalkathleen dynamicpartitioningofsearchpatternsforapproximatepatternmatchingusingsearchschemes AT fostierjan dynamicpartitioningofsearchpatternsforapproximatepatternmatchingusingsearchschemes |