Cargando…
Improved design and analysis of practical minimizers
MOTIVATION: Minimizers are methods to sample k-mers from a string, with the guarantee that similar set of k-mers will be chosen on similar strings. It is parameterized by the k-mer length k, a window length w and an order on the k-mers. Minimizers are used in a large number of softwares and pipeline...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8248892/ https://www.ncbi.nlm.nih.gov/pubmed/32657376 http://dx.doi.org/10.1093/bioinformatics/btaa472 |
_version_ | 1783716812033097728 |
---|---|
author | Zheng, Hongyu Kingsford, Carl Marçais, Guillaume |
author_facet | Zheng, Hongyu Kingsford, Carl Marçais, Guillaume |
author_sort | Zheng, Hongyu |
collection | PubMed |
description | MOTIVATION: Minimizers are methods to sample k-mers from a string, with the guarantee that similar set of k-mers will be chosen on similar strings. It is parameterized by the k-mer length k, a window length w and an order on the k-mers. Minimizers are used in a large number of softwares and pipelines to improve computation efficiency and decrease memory usage. Despite the method’s popularity, many theoretical questions regarding its performance remain open. The core metric for measuring performance of a minimizer is the density, which measures the sparsity of sampled k-mers. The theoretical optimal density for a minimizer is [Formula: see text] , provably not achievable in general. For given k and w, little is known about asymptotically optimal minimizers, that is minimizers with density [Formula: see text] . RESULTS: We derive a necessary and sufficient condition for existence of asymptotically optimal minimizers. We also provide a randomized algorithm, called the Miniception, to design minimizers with the best theoretical guarantee to date on density in practical scenarios. Constructing and using the Miniception is as easy as constructing and using a random minimizer, which allows the design of efficient minimizers that scale to the values of k and w used in current bioinformatics software programs. AVAILABILITY AND IMPLEMENTATION: Reference implementation of the Miniception and the codes for analysis can be found at https://github.com/kingsford-group/miniception. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-8248892 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-82488922021-07-02 Improved design and analysis of practical minimizers Zheng, Hongyu Kingsford, Carl Marçais, Guillaume Bioinformatics Comparative and Functional Genomics MOTIVATION: Minimizers are methods to sample k-mers from a string, with the guarantee that similar set of k-mers will be chosen on similar strings. It is parameterized by the k-mer length k, a window length w and an order on the k-mers. Minimizers are used in a large number of softwares and pipelines to improve computation efficiency and decrease memory usage. Despite the method’s popularity, many theoretical questions regarding its performance remain open. The core metric for measuring performance of a minimizer is the density, which measures the sparsity of sampled k-mers. The theoretical optimal density for a minimizer is [Formula: see text] , provably not achievable in general. For given k and w, little is known about asymptotically optimal minimizers, that is minimizers with density [Formula: see text] . RESULTS: We derive a necessary and sufficient condition for existence of asymptotically optimal minimizers. We also provide a randomized algorithm, called the Miniception, to design minimizers with the best theoretical guarantee to date on density in practical scenarios. Constructing and using the Miniception is as easy as constructing and using a random minimizer, which allows the design of efficient minimizers that scale to the values of k and w used in current bioinformatics software programs. AVAILABILITY AND IMPLEMENTATION: Reference implementation of the Miniception and the codes for analysis can be found at https://github.com/kingsford-group/miniception. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-07 2020-07-13 /pmc/articles/PMC8248892/ /pubmed/32657376 http://dx.doi.org/10.1093/bioinformatics/btaa472 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Comparative and Functional Genomics Zheng, Hongyu Kingsford, Carl Marçais, Guillaume Improved design and analysis of practical minimizers |
title | Improved design and analysis of practical minimizers |
title_full | Improved design and analysis of practical minimizers |
title_fullStr | Improved design and analysis of practical minimizers |
title_full_unstemmed | Improved design and analysis of practical minimizers |
title_short | Improved design and analysis of practical minimizers |
title_sort | improved design and analysis of practical minimizers |
topic | Comparative and Functional Genomics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8248892/ https://www.ncbi.nlm.nih.gov/pubmed/32657376 http://dx.doi.org/10.1093/bioinformatics/btaa472 |
work_keys_str_mv | AT zhenghongyu improveddesignandanalysisofpracticalminimizers AT kingsfordcarl improveddesignandanalysisofpracticalminimizers AT marcaisguillaume improveddesignandanalysisofpracticalminimizers |