Cargando…

Improved design and analysis of practical minimizers

MOTIVATION: Minimizers are methods to sample k-mers from a string, with the guarantee that similar set of k-mers will be chosen on similar strings. It is parameterized by the k-mer length k, a window length w and an order on the k-mers. Minimizers are used in a large number of softwares and pipeline...

Descripción completa

Detalles Bibliográficos
Autores principales: Zheng, Hongyu, Kingsford, Carl, Marçais, Guillaume
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8248892/
https://www.ncbi.nlm.nih.gov/pubmed/32657376
http://dx.doi.org/10.1093/bioinformatics/btaa472
_version_ 1783716812033097728
author Zheng, Hongyu
Kingsford, Carl
Marçais, Guillaume
author_facet Zheng, Hongyu
Kingsford, Carl
Marçais, Guillaume
author_sort Zheng, Hongyu
collection PubMed
description MOTIVATION: Minimizers are methods to sample k-mers from a string, with the guarantee that similar set of k-mers will be chosen on similar strings. It is parameterized by the k-mer length k, a window length w and an order on the k-mers. Minimizers are used in a large number of softwares and pipelines to improve computation efficiency and decrease memory usage. Despite the method’s popularity, many theoretical questions regarding its performance remain open. The core metric for measuring performance of a minimizer is the density, which measures the sparsity of sampled k-mers. The theoretical optimal density for a minimizer is [Formula: see text] , provably not achievable in general. For given k and w, little is known about asymptotically optimal minimizers, that is minimizers with density [Formula: see text] . RESULTS: We derive a necessary and sufficient condition for existence of asymptotically optimal minimizers. We also provide a randomized algorithm, called the Miniception, to design minimizers with the best theoretical guarantee to date on density in practical scenarios. Constructing and using the Miniception is as easy as constructing and using a random minimizer, which allows the design of efficient minimizers that scale to the values of k and w used in current bioinformatics software programs. AVAILABILITY AND IMPLEMENTATION: Reference implementation of the Miniception and the codes for analysis can be found at https://github.com/kingsford-group/miniception. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8248892
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-82488922021-07-02 Improved design and analysis of practical minimizers Zheng, Hongyu Kingsford, Carl Marçais, Guillaume Bioinformatics Comparative and Functional Genomics MOTIVATION: Minimizers are methods to sample k-mers from a string, with the guarantee that similar set of k-mers will be chosen on similar strings. It is parameterized by the k-mer length k, a window length w and an order on the k-mers. Minimizers are used in a large number of softwares and pipelines to improve computation efficiency and decrease memory usage. Despite the method’s popularity, many theoretical questions regarding its performance remain open. The core metric for measuring performance of a minimizer is the density, which measures the sparsity of sampled k-mers. The theoretical optimal density for a minimizer is [Formula: see text] , provably not achievable in general. For given k and w, little is known about asymptotically optimal minimizers, that is minimizers with density [Formula: see text] . RESULTS: We derive a necessary and sufficient condition for existence of asymptotically optimal minimizers. We also provide a randomized algorithm, called the Miniception, to design minimizers with the best theoretical guarantee to date on density in practical scenarios. Constructing and using the Miniception is as easy as constructing and using a random minimizer, which allows the design of efficient minimizers that scale to the values of k and w used in current bioinformatics software programs. AVAILABILITY AND IMPLEMENTATION: Reference implementation of the Miniception and the codes for analysis can be found at https://github.com/kingsford-group/miniception. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-07 2020-07-13 /pmc/articles/PMC8248892/ /pubmed/32657376 http://dx.doi.org/10.1093/bioinformatics/btaa472 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Comparative and Functional Genomics
Zheng, Hongyu
Kingsford, Carl
Marçais, Guillaume
Improved design and analysis of practical minimizers
title Improved design and analysis of practical minimizers
title_full Improved design and analysis of practical minimizers
title_fullStr Improved design and analysis of practical minimizers
title_full_unstemmed Improved design and analysis of practical minimizers
title_short Improved design and analysis of practical minimizers
title_sort improved design and analysis of practical minimizers
topic Comparative and Functional Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8248892/
https://www.ncbi.nlm.nih.gov/pubmed/32657376
http://dx.doi.org/10.1093/bioinformatics/btaa472
work_keys_str_mv AT zhenghongyu improveddesignandanalysisofpracticalminimizers
AT kingsfordcarl improveddesignandanalysisofpracticalminimizers
AT marcaisguillaume improveddesignandanalysisofpracticalminimizers