Cargando…

Syncmers are more sensitive than minimizers for selecting conserved k‑mers in biological sequences

Minimizers are widely used to select subsets of fixed-length substrings (k-mers) from biological sequences in applications ranging from read mapping to taxonomy prediction and indexing of large datasets. The minimizer of a string of w consecutive k-mers is the k-mer with smallest value according to...

Descripción completa

Detalles Bibliográficos
Autor principal: Edgar, Robert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7869670/
https://www.ncbi.nlm.nih.gov/pubmed/33604186
http://dx.doi.org/10.7717/peerj.10805
_version_ 1783648672807911424
author Edgar, Robert
author_facet Edgar, Robert
author_sort Edgar, Robert
collection PubMed
description Minimizers are widely used to select subsets of fixed-length substrings (k-mers) from biological sequences in applications ranging from read mapping to taxonomy prediction and indexing of large datasets. The minimizer of a string of w consecutive k-mers is the k-mer with smallest value according to an ordering of all k-mers. Syncmers are defined here as a family of alternative methods which select k-mers by inspecting the position of the smallest-valued substring of length s < k within the k-mer. For example, a closed syncmer is selected if its smallest s-mer is at the start or end of the k-mer. At least one closed syncmer must be found in every window of length (k − s) k-mers. Unlike a minimizer, a syncmer is identified by its sequence alone, and is therefore synchronized in the following sense: if a given k-mer is selected from one sequence, it will also be selected from any other sequence. Also, minimizers can be deleted by mutations in flanking sequence, which cannot happen with syncmers. Experiments on minimizers with parameters used in the minimap2 read mapper and Kraken taxonomy prediction algorithm respectively show that syncmers can simultaneously achieve both lower density and higher conservation compared to minimizers.
format Online
Article
Text
id pubmed-7869670
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-78696702021-02-17 Syncmers are more sensitive than minimizers for selecting conserved k‑mers in biological sequences Edgar, Robert PeerJ Bioinformatics Minimizers are widely used to select subsets of fixed-length substrings (k-mers) from biological sequences in applications ranging from read mapping to taxonomy prediction and indexing of large datasets. The minimizer of a string of w consecutive k-mers is the k-mer with smallest value according to an ordering of all k-mers. Syncmers are defined here as a family of alternative methods which select k-mers by inspecting the position of the smallest-valued substring of length s < k within the k-mer. For example, a closed syncmer is selected if its smallest s-mer is at the start or end of the k-mer. At least one closed syncmer must be found in every window of length (k − s) k-mers. Unlike a minimizer, a syncmer is identified by its sequence alone, and is therefore synchronized in the following sense: if a given k-mer is selected from one sequence, it will also be selected from any other sequence. Also, minimizers can be deleted by mutations in flanking sequence, which cannot happen with syncmers. Experiments on minimizers with parameters used in the minimap2 read mapper and Kraken taxonomy prediction algorithm respectively show that syncmers can simultaneously achieve both lower density and higher conservation compared to minimizers. PeerJ Inc. 2021-02-05 /pmc/articles/PMC7869670/ /pubmed/33604186 http://dx.doi.org/10.7717/peerj.10805 Text en © 2021 Edgar https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Edgar, Robert
Syncmers are more sensitive than minimizers for selecting conserved k‑mers in biological sequences
title Syncmers are more sensitive than minimizers for selecting conserved k‑mers in biological sequences
title_full Syncmers are more sensitive than minimizers for selecting conserved k‑mers in biological sequences
title_fullStr Syncmers are more sensitive than minimizers for selecting conserved k‑mers in biological sequences
title_full_unstemmed Syncmers are more sensitive than minimizers for selecting conserved k‑mers in biological sequences
title_short Syncmers are more sensitive than minimizers for selecting conserved k‑mers in biological sequences
title_sort syncmers are more sensitive than minimizers for selecting conserved k‑mers in biological sequences
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7869670/
https://www.ncbi.nlm.nih.gov/pubmed/33604186
http://dx.doi.org/10.7717/peerj.10805
work_keys_str_mv AT edgarrobert syncmersaremoresensitivethanminimizersforselectingconservedkmersinbiologicalsequences