Cargando…

Effective sequence similarity detection with strobemers

k-mer-based methods are widely used in bioinformatics for various types of sequence comparisons. However, a single mutation will mutate k consecutive k-mers and make most k-mer-based applications for sequence comparison sensitive to variable mutation rates. Many techniques have been studied to overc...

Descripción completa

Detalles Bibliográficos
Autor principal:	Sahlin, Kristoffer
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cold Spring Harbor Laboratory Press 2021
Materias:	Method
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8559714/ https://www.ncbi.nlm.nih.gov/pubmed/34667119 http://dx.doi.org/10.1101/gr.275648.121

_version_	1784592816564862976
author	Sahlin, Kristoffer
author_facet	Sahlin, Kristoffer
author_sort	Sahlin, Kristoffer
collection	PubMed
description	k-mer-based methods are widely used in bioinformatics for various types of sequence comparisons. However, a single mutation will mutate k consecutive k-mers and make most k-mer-based applications for sequence comparison sensitive to variable mutation rates. Many techniques have been studied to overcome this sensitivity, for example, spaced k-mers and k-mer permutation techniques, but these techniques do not handle indels well. For indels, pairs or groups of small k-mers are commonly used, but these methods first produce k-mer matches, and only in a second step, a pairing or grouping of k-mers is performed. Such techniques produce many redundant k-mer matches owing to the size of k. Here, we propose strobemers as an alternative to k-mers for sequence comparison. Intuitively, strobemers consist of two or more linked shorter k-mers, where the combination of linked k-mers is decided by a hash function. We use simulated data to show that strobemers provide more evenly distributed sequence matches and are less sensitive to different mutation rates than k-mers and spaced k-mers. Strobemers also produce higher match coverage across sequences. We further implement a proof-of-concept sequence-matching tool StrobeMap and use synthetic and biological Oxford Nanopore sequencing data to show the utility of using strobemers for sequence comparison in different contexts such as sequence clustering and alignment scenarios.
format	Online Article Text
id	pubmed-8559714
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Cold Spring Harbor Laboratory Press
record_format	MEDLINE/PubMed
spelling	pubmed-85597142021-11-10 Effective sequence similarity detection with strobemers Sahlin, Kristoffer Genome Res Method k-mer-based methods are widely used in bioinformatics for various types of sequence comparisons. However, a single mutation will mutate k consecutive k-mers and make most k-mer-based applications for sequence comparison sensitive to variable mutation rates. Many techniques have been studied to overcome this sensitivity, for example, spaced k-mers and k-mer permutation techniques, but these techniques do not handle indels well. For indels, pairs or groups of small k-mers are commonly used, but these methods first produce k-mer matches, and only in a second step, a pairing or grouping of k-mers is performed. Such techniques produce many redundant k-mer matches owing to the size of k. Here, we propose strobemers as an alternative to k-mers for sequence comparison. Intuitively, strobemers consist of two or more linked shorter k-mers, where the combination of linked k-mers is decided by a hash function. We use simulated data to show that strobemers provide more evenly distributed sequence matches and are less sensitive to different mutation rates than k-mers and spaced k-mers. Strobemers also produce higher match coverage across sequences. We further implement a proof-of-concept sequence-matching tool StrobeMap and use synthetic and biological Oxford Nanopore sequencing data to show the utility of using strobemers for sequence comparison in different contexts such as sequence clustering and alignment scenarios. Cold Spring Harbor Laboratory Press 2021-11 /pmc/articles/PMC8559714/ /pubmed/34667119 http://dx.doi.org/10.1101/gr.275648.121 Text en © 2021 Sahlin; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by/4.0/This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Method Sahlin, Kristoffer Effective sequence similarity detection with strobemers
title	Effective sequence similarity detection with strobemers
title_full	Effective sequence similarity detection with strobemers
title_fullStr	Effective sequence similarity detection with strobemers
title_full_unstemmed	Effective sequence similarity detection with strobemers
title_short	Effective sequence similarity detection with strobemers
title_sort	effective sequence similarity detection with strobemers
topic	Method
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8559714/ https://www.ncbi.nlm.nih.gov/pubmed/34667119 http://dx.doi.org/10.1101/gr.275648.121
work_keys_str_mv	AT sahlinkristoffer effectivesequencesimilaritydetectionwithstrobemers

Effective sequence similarity detection with strobemers

Ejemplares similares