Cargando…

Effects of spaced k-mers on alignment-free genotyping

MOTIVATION: Alignment-free, k-mer based genotyping methods are a fast alternative to alignment-based methods and are particularly well suited for genotyping larger cohorts. The sensitivity of algorithms, that work with k-mers, can be increased by using spaced seeds, however, the application of space...

Descripción completa

Detalles Bibliográficos
Autores principales: Häntze, Hartmut, Horton, Paul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311327/
https://www.ncbi.nlm.nih.gov/pubmed/37387138
http://dx.doi.org/10.1093/bioinformatics/btad202
_version_ 1785066720124207104
author Häntze, Hartmut
Horton, Paul
author_facet Häntze, Hartmut
Horton, Paul
author_sort Häntze, Hartmut
collection PubMed
description MOTIVATION: Alignment-free, k-mer based genotyping methods are a fast alternative to alignment-based methods and are particularly well suited for genotyping larger cohorts. The sensitivity of algorithms, that work with k-mers, can be increased by using spaced seeds, however, the application of spaced seeds in k-mer based genotyping methods has not been researched yet. RESULTS: We add a spaced seeds functionality to the genotyping software PanGenie and use it to calculate genotypes. This significantly improves sensitivity and F-score when genotyping SNPs, indels, and structural variants on reads with low (5×) and high (30×) coverage. Improvements are greater than what could be achieved by just increasing the length of contiguous k-mers. Effect sizes are particularly large for low coverage data. If applications implement effective algorithms for hashing of spaced k-mers, spaced k-mers have the potential to become an useful technique in k-mer based genotyping. AVAILABILITY AND IMPLEMENTATION: The source code of our proposed tool MaskedPanGenie is openly available on https://github.com/hhaentze/MaskedPangenie.
format Online
Article
Text
id pubmed-10311327
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103113272023-07-01 Effects of spaced k-mers on alignment-free genotyping Häntze, Hartmut Horton, Paul Bioinformatics Genome Sequence Analysis MOTIVATION: Alignment-free, k-mer based genotyping methods are a fast alternative to alignment-based methods and are particularly well suited for genotyping larger cohorts. The sensitivity of algorithms, that work with k-mers, can be increased by using spaced seeds, however, the application of spaced seeds in k-mer based genotyping methods has not been researched yet. RESULTS: We add a spaced seeds functionality to the genotyping software PanGenie and use it to calculate genotypes. This significantly improves sensitivity and F-score when genotyping SNPs, indels, and structural variants on reads with low (5×) and high (30×) coverage. Improvements are greater than what could be achieved by just increasing the length of contiguous k-mers. Effect sizes are particularly large for low coverage data. If applications implement effective algorithms for hashing of spaced k-mers, spaced k-mers have the potential to become an useful technique in k-mer based genotyping. AVAILABILITY AND IMPLEMENTATION: The source code of our proposed tool MaskedPanGenie is openly available on https://github.com/hhaentze/MaskedPangenie. Oxford University Press 2023-06-30 /pmc/articles/PMC10311327/ /pubmed/37387138 http://dx.doi.org/10.1093/bioinformatics/btad202 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Genome Sequence Analysis
Häntze, Hartmut
Horton, Paul
Effects of spaced k-mers on alignment-free genotyping
title Effects of spaced k-mers on alignment-free genotyping
title_full Effects of spaced k-mers on alignment-free genotyping
title_fullStr Effects of spaced k-mers on alignment-free genotyping
title_full_unstemmed Effects of spaced k-mers on alignment-free genotyping
title_short Effects of spaced k-mers on alignment-free genotyping
title_sort effects of spaced k-mers on alignment-free genotyping
topic Genome Sequence Analysis
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311327/
https://www.ncbi.nlm.nih.gov/pubmed/37387138
http://dx.doi.org/10.1093/bioinformatics/btad202
work_keys_str_mv AT hantzehartmut effectsofspacedkmersonalignmentfreegenotyping
AT hortonpaul effectsofspacedkmersonalignmentfreegenotyping