Cargando…

Seedability: optimizing alignment parameters for sensitive sequence comparison

MOTIVATION: Most sequence alignment techniques make use of exact k-mer hits, called seeds, as anchors to optimize alignment speed. A large number of bioinformatics tools employing seed-based alignment techniques, such as [Formula: see text] , use a single value of k per sequencing technology, withou...

Descripción completa

Detalles Bibliográficos
Autores principales: Ayad, Lorraine A K, Chikhi, Rayan, Pissis, Solon P
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10444664/
https://www.ncbi.nlm.nih.gov/pubmed/37621456
http://dx.doi.org/10.1093/bioadv/vbad108
_version_ 1785093998988230656
author Ayad, Lorraine A K
Chikhi, Rayan
Pissis, Solon P
author_facet Ayad, Lorraine A K
Chikhi, Rayan
Pissis, Solon P
author_sort Ayad, Lorraine A K
collection PubMed
description MOTIVATION: Most sequence alignment techniques make use of exact k-mer hits, called seeds, as anchors to optimize alignment speed. A large number of bioinformatics tools employing seed-based alignment techniques, such as [Formula: see text] , use a single value of k per sequencing technology, without a strong guarantee that this is the best possible value. Given the ubiquity of sequence alignment, identifying values of k that lead to more sensitive alignments is thus an important task. To aid this, we present [Formula: see text] , a seed-based alignment framework designed for estimating an optimal seed k-mer length (as well as a minimal number of shared seeds) based on a given alignment identity threshold. In particular, we were motivated to make [Formula: see text] more sensitive in the pairwise alignment of short sequences. RESULTS: The experimental results herein show improved alignments of short and divergent sequences when using the parameter values determined by [Formula: see text] in comparison to the default values of [Formula: see text]. We also show several cases of pairs of real divergent sequences, where the default parameter values of [Formula: see text] yield no output alignments, but the values output by [Formula: see text] produce plausible alignments. AVAILABILITY AND IMPLEMENTATION: https://github.com/lorrainea/Seedability (distributed under GPL v3.0).
format Online
Article
Text
id pubmed-10444664
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-104446642023-08-24 Seedability: optimizing alignment parameters for sensitive sequence comparison Ayad, Lorraine A K Chikhi, Rayan Pissis, Solon P Bioinform Adv Original Article MOTIVATION: Most sequence alignment techniques make use of exact k-mer hits, called seeds, as anchors to optimize alignment speed. A large number of bioinformatics tools employing seed-based alignment techniques, such as [Formula: see text] , use a single value of k per sequencing technology, without a strong guarantee that this is the best possible value. Given the ubiquity of sequence alignment, identifying values of k that lead to more sensitive alignments is thus an important task. To aid this, we present [Formula: see text] , a seed-based alignment framework designed for estimating an optimal seed k-mer length (as well as a minimal number of shared seeds) based on a given alignment identity threshold. In particular, we were motivated to make [Formula: see text] more sensitive in the pairwise alignment of short sequences. RESULTS: The experimental results herein show improved alignments of short and divergent sequences when using the parameter values determined by [Formula: see text] in comparison to the default values of [Formula: see text]. We also show several cases of pairs of real divergent sequences, where the default parameter values of [Formula: see text] yield no output alignments, but the values output by [Formula: see text] produce plausible alignments. AVAILABILITY AND IMPLEMENTATION: https://github.com/lorrainea/Seedability (distributed under GPL v3.0). Oxford University Press 2023-08-12 /pmc/articles/PMC10444664/ /pubmed/37621456 http://dx.doi.org/10.1093/bioadv/vbad108 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Ayad, Lorraine A K
Chikhi, Rayan
Pissis, Solon P
Seedability: optimizing alignment parameters for sensitive sequence comparison
title Seedability: optimizing alignment parameters for sensitive sequence comparison
title_full Seedability: optimizing alignment parameters for sensitive sequence comparison
title_fullStr Seedability: optimizing alignment parameters for sensitive sequence comparison
title_full_unstemmed Seedability: optimizing alignment parameters for sensitive sequence comparison
title_short Seedability: optimizing alignment parameters for sensitive sequence comparison
title_sort seedability: optimizing alignment parameters for sensitive sequence comparison
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10444664/
https://www.ncbi.nlm.nih.gov/pubmed/37621456
http://dx.doi.org/10.1093/bioadv/vbad108
work_keys_str_mv AT ayadlorraineak seedabilityoptimizingalignmentparametersforsensitivesequencecomparison
AT chikhirayan seedabilityoptimizingalignmentparametersforsensitivesequencecomparison
AT pississolonp seedabilityoptimizingalignmentparametersforsensitivesequencecomparison