Cargando…

copMEM2: robust and scalable maximum exact match finding

SUMMARY: Finding Maximum Exact Matches, i.e. matches between two strings that cannot be further extended to the left or right, is a classic string problem with applications in genome-to-genome comparisons. The existing tools rarely explicitly address the problem of MEM finding for a pair of very sim...

Descripción completa

Detalles Bibliográficos
Autores principales: Grabowski, Szymon, Bieniecki, Wojciech
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10209524/
https://www.ncbi.nlm.nih.gov/pubmed/37171886
http://dx.doi.org/10.1093/bioinformatics/btad313
_version_ 1785046893848428544
author Grabowski, Szymon
Bieniecki, Wojciech
author_facet Grabowski, Szymon
Bieniecki, Wojciech
author_sort Grabowski, Szymon
collection PubMed
description SUMMARY: Finding Maximum Exact Matches, i.e. matches between two strings that cannot be further extended to the left or right, is a classic string problem with applications in genome-to-genome comparisons. The existing tools rarely explicitly address the problem of MEM finding for a pair of very similar genomes, which may be computationally challenging. We present copMEM2, a multithreaded implementation of its predecessor. Together with a few optimizations, including a carefully built predecessor query data structure and sort procedure selection, and taking care for highly similar data, copMEM2 allows to compute all MEMs of minimum length 50 between the human and mouse genomes in 59 s, using 10.40 GB of RAM and 12 threads, being at least a few times faster than its main contenders. On a pair of human genomes, hg18 and hg19, the results are 324 s and 16.57 GB, respectively. AVAILABILITY AND IMPLEMENTATION: copMEM2 is available at https://github.com/wbieniec/copmem2.
format Online
Article
Text
id pubmed-10209524
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-102095242023-05-26 copMEM2: robust and scalable maximum exact match finding Grabowski, Szymon Bieniecki, Wojciech Bioinformatics Applications Note SUMMARY: Finding Maximum Exact Matches, i.e. matches between two strings that cannot be further extended to the left or right, is a classic string problem with applications in genome-to-genome comparisons. The existing tools rarely explicitly address the problem of MEM finding for a pair of very similar genomes, which may be computationally challenging. We present copMEM2, a multithreaded implementation of its predecessor. Together with a few optimizations, including a carefully built predecessor query data structure and sort procedure selection, and taking care for highly similar data, copMEM2 allows to compute all MEMs of minimum length 50 between the human and mouse genomes in 59 s, using 10.40 GB of RAM and 12 threads, being at least a few times faster than its main contenders. On a pair of human genomes, hg18 and hg19, the results are 324 s and 16.57 GB, respectively. AVAILABILITY AND IMPLEMENTATION: copMEM2 is available at https://github.com/wbieniec/copmem2. Oxford University Press 2023-05-12 /pmc/articles/PMC10209524/ /pubmed/37171886 http://dx.doi.org/10.1093/bioinformatics/btad313 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Note
Grabowski, Szymon
Bieniecki, Wojciech
copMEM2: robust and scalable maximum exact match finding
title copMEM2: robust and scalable maximum exact match finding
title_full copMEM2: robust and scalable maximum exact match finding
title_fullStr copMEM2: robust and scalable maximum exact match finding
title_full_unstemmed copMEM2: robust and scalable maximum exact match finding
title_short copMEM2: robust and scalable maximum exact match finding
title_sort copmem2: robust and scalable maximum exact match finding
topic Applications Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10209524/
https://www.ncbi.nlm.nih.gov/pubmed/37171886
http://dx.doi.org/10.1093/bioinformatics/btad313
work_keys_str_mv AT grabowskiszymon copmem2robustandscalablemaximumexactmatchfinding
AT bienieckiwojciech copmem2robustandscalablemaximumexactmatchfinding