Cargando…
Effects of spaced k-mers on alignment-free genotyping
MOTIVATION: Alignment-free, k-mer based genotyping methods are a fast alternative to alignment-based methods and are particularly well suited for genotyping larger cohorts. The sensitivity of algorithms, that work with k-mers, can be increased by using spaced seeds, however, the application of space...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311327/ https://www.ncbi.nlm.nih.gov/pubmed/37387138 http://dx.doi.org/10.1093/bioinformatics/btad202 |
_version_ | 1785066720124207104 |
---|---|
author | Häntze, Hartmut Horton, Paul |
author_facet | Häntze, Hartmut Horton, Paul |
author_sort | Häntze, Hartmut |
collection | PubMed |
description | MOTIVATION: Alignment-free, k-mer based genotyping methods are a fast alternative to alignment-based methods and are particularly well suited for genotyping larger cohorts. The sensitivity of algorithms, that work with k-mers, can be increased by using spaced seeds, however, the application of spaced seeds in k-mer based genotyping methods has not been researched yet. RESULTS: We add a spaced seeds functionality to the genotyping software PanGenie and use it to calculate genotypes. This significantly improves sensitivity and F-score when genotyping SNPs, indels, and structural variants on reads with low (5×) and high (30×) coverage. Improvements are greater than what could be achieved by just increasing the length of contiguous k-mers. Effect sizes are particularly large for low coverage data. If applications implement effective algorithms for hashing of spaced k-mers, spaced k-mers have the potential to become an useful technique in k-mer based genotyping. AVAILABILITY AND IMPLEMENTATION: The source code of our proposed tool MaskedPanGenie is openly available on https://github.com/hhaentze/MaskedPangenie. |
format | Online Article Text |
id | pubmed-10311327 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-103113272023-07-01 Effects of spaced k-mers on alignment-free genotyping Häntze, Hartmut Horton, Paul Bioinformatics Genome Sequence Analysis MOTIVATION: Alignment-free, k-mer based genotyping methods are a fast alternative to alignment-based methods and are particularly well suited for genotyping larger cohorts. The sensitivity of algorithms, that work with k-mers, can be increased by using spaced seeds, however, the application of spaced seeds in k-mer based genotyping methods has not been researched yet. RESULTS: We add a spaced seeds functionality to the genotyping software PanGenie and use it to calculate genotypes. This significantly improves sensitivity and F-score when genotyping SNPs, indels, and structural variants on reads with low (5×) and high (30×) coverage. Improvements are greater than what could be achieved by just increasing the length of contiguous k-mers. Effect sizes are particularly large for low coverage data. If applications implement effective algorithms for hashing of spaced k-mers, spaced k-mers have the potential to become an useful technique in k-mer based genotyping. AVAILABILITY AND IMPLEMENTATION: The source code of our proposed tool MaskedPanGenie is openly available on https://github.com/hhaentze/MaskedPangenie. Oxford University Press 2023-06-30 /pmc/articles/PMC10311327/ /pubmed/37387138 http://dx.doi.org/10.1093/bioinformatics/btad202 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Genome Sequence Analysis Häntze, Hartmut Horton, Paul Effects of spaced k-mers on alignment-free genotyping |
title | Effects of spaced k-mers on alignment-free genotyping |
title_full | Effects of spaced k-mers on alignment-free genotyping |
title_fullStr | Effects of spaced k-mers on alignment-free genotyping |
title_full_unstemmed | Effects of spaced k-mers on alignment-free genotyping |
title_short | Effects of spaced k-mers on alignment-free genotyping |
title_sort | effects of spaced k-mers on alignment-free genotyping |
topic | Genome Sequence Analysis |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311327/ https://www.ncbi.nlm.nih.gov/pubmed/37387138 http://dx.doi.org/10.1093/bioinformatics/btad202 |
work_keys_str_mv | AT hantzehartmut effectsofspacedkmersonalignmentfreegenotyping AT hortonpaul effectsofspacedkmersonalignmentfreegenotyping |