Cargando…
Asymptotically optimal minimizers schemes
MOTIVATION: The minimizers technique is a method to sample k-mers that is used in many bioinformatics software to reduce computation, memory usage and run time. The number of applications using minimizers keeps on growing steadily. Despite its many uses, the theoretical understanding of minimizers i...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6037127/ https://www.ncbi.nlm.nih.gov/pubmed/29949995 http://dx.doi.org/10.1093/bioinformatics/bty258 |
_version_ | 1783338284703809536 |
---|---|
author | Marçais, Guillaume DeBlasio, Dan Kingsford, Carl |
author_facet | Marçais, Guillaume DeBlasio, Dan Kingsford, Carl |
author_sort | Marçais, Guillaume |
collection | PubMed |
description | MOTIVATION: The minimizers technique is a method to sample k-mers that is used in many bioinformatics software to reduce computation, memory usage and run time. The number of applications using minimizers keeps on growing steadily. Despite its many uses, the theoretical understanding of minimizers is still very limited. In many applications, selecting as few k-mers as possible (i.e. having a low density) is beneficial. The density is highly dependent on the choice of the order on the k-mers. Different applications use different orders, but none of these orders are optimal. A better understanding of minimizers schemes, and the related local and forward schemes, will allow designing schemes with lower density and thereby making existing and future bioinformatics tools even more efficient. RESULTS: From the analysis of the asymptotic behavior of minimizers, forward and local schemes, we show that the previously believed lower bound on minimizers schemes does not hold, and that schemes with density lower than thought possible actually exist. The proof is constructive and leads to an efficient algorithm to compare k-mers. These orders are the first known orders that are asymptotically optimal. Additionally, we give improved bounds on the density achievable by the three type of schemes. |
format | Online Article Text |
id | pubmed-6037127 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-60371272018-07-12 Asymptotically optimal minimizers schemes Marçais, Guillaume DeBlasio, Dan Kingsford, Carl Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: The minimizers technique is a method to sample k-mers that is used in many bioinformatics software to reduce computation, memory usage and run time. The number of applications using minimizers keeps on growing steadily. Despite its many uses, the theoretical understanding of minimizers is still very limited. In many applications, selecting as few k-mers as possible (i.e. having a low density) is beneficial. The density is highly dependent on the choice of the order on the k-mers. Different applications use different orders, but none of these orders are optimal. A better understanding of minimizers schemes, and the related local and forward schemes, will allow designing schemes with lower density and thereby making existing and future bioinformatics tools even more efficient. RESULTS: From the analysis of the asymptotic behavior of minimizers, forward and local schemes, we show that the previously believed lower bound on minimizers schemes does not hold, and that schemes with density lower than thought possible actually exist. The proof is constructive and leads to an efficient algorithm to compare k-mers. These orders are the first known orders that are asymptotically optimal. Additionally, we give improved bounds on the density achievable by the three type of schemes. Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6037127/ /pubmed/29949995 http://dx.doi.org/10.1093/bioinformatics/bty258 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Ismb 2018–Intelligent Systems for Molecular Biology Proceedings Marçais, Guillaume DeBlasio, Dan Kingsford, Carl Asymptotically optimal minimizers schemes |
title | Asymptotically optimal minimizers schemes |
title_full | Asymptotically optimal minimizers schemes |
title_fullStr | Asymptotically optimal minimizers schemes |
title_full_unstemmed | Asymptotically optimal minimizers schemes |
title_short | Asymptotically optimal minimizers schemes |
title_sort | asymptotically optimal minimizers schemes |
topic | Ismb 2018–Intelligent Systems for Molecular Biology Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6037127/ https://www.ncbi.nlm.nih.gov/pubmed/29949995 http://dx.doi.org/10.1093/bioinformatics/bty258 |
work_keys_str_mv | AT marcaisguillaume asymptoticallyoptimalminimizersschemes AT deblasiodan asymptoticallyoptimalminimizersschemes AT kingsfordcarl asymptoticallyoptimalminimizersschemes |