Cargando…
Syncmers are more sensitive than minimizers for selecting conserved k‑mers in biological sequences
Minimizers are widely used to select subsets of fixed-length substrings (k-mers) from biological sequences in applications ranging from read mapping to taxonomy prediction and indexing of large datasets. The minimizer of a string of w consecutive k-mers is the k-mer with smallest value according to...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7869670/ https://www.ncbi.nlm.nih.gov/pubmed/33604186 http://dx.doi.org/10.7717/peerj.10805 |
_version_ | 1783648672807911424 |
---|---|
author | Edgar, Robert |
author_facet | Edgar, Robert |
author_sort | Edgar, Robert |
collection | PubMed |
description | Minimizers are widely used to select subsets of fixed-length substrings (k-mers) from biological sequences in applications ranging from read mapping to taxonomy prediction and indexing of large datasets. The minimizer of a string of w consecutive k-mers is the k-mer with smallest value according to an ordering of all k-mers. Syncmers are defined here as a family of alternative methods which select k-mers by inspecting the position of the smallest-valued substring of length s < k within the k-mer. For example, a closed syncmer is selected if its smallest s-mer is at the start or end of the k-mer. At least one closed syncmer must be found in every window of length (k − s) k-mers. Unlike a minimizer, a syncmer is identified by its sequence alone, and is therefore synchronized in the following sense: if a given k-mer is selected from one sequence, it will also be selected from any other sequence. Also, minimizers can be deleted by mutations in flanking sequence, which cannot happen with syncmers. Experiments on minimizers with parameters used in the minimap2 read mapper and Kraken taxonomy prediction algorithm respectively show that syncmers can simultaneously achieve both lower density and higher conservation compared to minimizers. |
format | Online Article Text |
id | pubmed-7869670 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-78696702021-02-17 Syncmers are more sensitive than minimizers for selecting conserved k‑mers in biological sequences Edgar, Robert PeerJ Bioinformatics Minimizers are widely used to select subsets of fixed-length substrings (k-mers) from biological sequences in applications ranging from read mapping to taxonomy prediction and indexing of large datasets. The minimizer of a string of w consecutive k-mers is the k-mer with smallest value according to an ordering of all k-mers. Syncmers are defined here as a family of alternative methods which select k-mers by inspecting the position of the smallest-valued substring of length s < k within the k-mer. For example, a closed syncmer is selected if its smallest s-mer is at the start or end of the k-mer. At least one closed syncmer must be found in every window of length (k − s) k-mers. Unlike a minimizer, a syncmer is identified by its sequence alone, and is therefore synchronized in the following sense: if a given k-mer is selected from one sequence, it will also be selected from any other sequence. Also, minimizers can be deleted by mutations in flanking sequence, which cannot happen with syncmers. Experiments on minimizers with parameters used in the minimap2 read mapper and Kraken taxonomy prediction algorithm respectively show that syncmers can simultaneously achieve both lower density and higher conservation compared to minimizers. PeerJ Inc. 2021-02-05 /pmc/articles/PMC7869670/ /pubmed/33604186 http://dx.doi.org/10.7717/peerj.10805 Text en © 2021 Edgar https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Edgar, Robert Syncmers are more sensitive than minimizers for selecting conserved k‑mers in biological sequences |
title | Syncmers are more sensitive than minimizers for selecting conserved k‑mers in biological sequences |
title_full | Syncmers are more sensitive than minimizers for selecting conserved k‑mers in biological sequences |
title_fullStr | Syncmers are more sensitive than minimizers for selecting conserved k‑mers in biological sequences |
title_full_unstemmed | Syncmers are more sensitive than minimizers for selecting conserved k‑mers in biological sequences |
title_short | Syncmers are more sensitive than minimizers for selecting conserved k‑mers in biological sequences |
title_sort | syncmers are more sensitive than minimizers for selecting conserved k‑mers in biological sequences |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7869670/ https://www.ncbi.nlm.nih.gov/pubmed/33604186 http://dx.doi.org/10.7717/peerj.10805 |
work_keys_str_mv | AT edgarrobert syncmersaremoresensitivethanminimizersforselectingconservedkmersinbiologicalsequences |