Cargando…
Bit-parallel sequence-to-graph alignment
MOTIVATION: Graphs are commonly used to represent sets of sequences. Either edges or nodes can be labeled by sequences, so that each path in the graph spells a concatenated sequence. Examples include graphs to represent genome assemblies, such as string graphs and de Bruijn graphs, and graphs to rep...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6761980/ https://www.ncbi.nlm.nih.gov/pubmed/30851095 http://dx.doi.org/10.1093/bioinformatics/btz162 |
_version_ | 1783454134032138240 |
---|---|
author | Rautiainen, Mikko Mäkinen, Veli Marschall, Tobias |
author_facet | Rautiainen, Mikko Mäkinen, Veli Marschall, Tobias |
author_sort | Rautiainen, Mikko |
collection | PubMed |
description | MOTIVATION: Graphs are commonly used to represent sets of sequences. Either edges or nodes can be labeled by sequences, so that each path in the graph spells a concatenated sequence. Examples include graphs to represent genome assemblies, such as string graphs and de Bruijn graphs, and graphs to represent a pan-genome and hence the genetic variation present in a population. Being able to align sequencing reads to such graphs is a key step for many analyses and its applications include genome assembly, read error correction and variant calling with respect to a variation graph. RESULTS: We generalize two linear sequence-to-sequence algorithms to graphs: the Shift-And algorithm for exact matching and Myers’ bitvector algorithm for semi-global alignment. These linear algorithms are both based on processing w sequence characters with a constant number of operations, where w is the word size of the machine (commonly 64), and achieve a speedup of up to w over naive algorithms. For a graph with [Formula: see text] nodes and [Formula: see text] edges and a sequence of length m, our bitvector-based graph alignment algorithm reaches a worst case runtime of [Formula: see text] for acyclic graphs and [Formula: see text] for arbitrary cyclic graphs. We apply it to five different types of graphs and observe a speedup between 3-fold and 20-fold compared with a previous (asymptotically optimal) alignment algorithm. AVAILABILITY AND IMPLEMENTATION: https://github.com/maickrau/GraphAligner SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-6761980 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-67619802019-10-02 Bit-parallel sequence-to-graph alignment Rautiainen, Mikko Mäkinen, Veli Marschall, Tobias Bioinformatics Original Papers MOTIVATION: Graphs are commonly used to represent sets of sequences. Either edges or nodes can be labeled by sequences, so that each path in the graph spells a concatenated sequence. Examples include graphs to represent genome assemblies, such as string graphs and de Bruijn graphs, and graphs to represent a pan-genome and hence the genetic variation present in a population. Being able to align sequencing reads to such graphs is a key step for many analyses and its applications include genome assembly, read error correction and variant calling with respect to a variation graph. RESULTS: We generalize two linear sequence-to-sequence algorithms to graphs: the Shift-And algorithm for exact matching and Myers’ bitvector algorithm for semi-global alignment. These linear algorithms are both based on processing w sequence characters with a constant number of operations, where w is the word size of the machine (commonly 64), and achieve a speedup of up to w over naive algorithms. For a graph with [Formula: see text] nodes and [Formula: see text] edges and a sequence of length m, our bitvector-based graph alignment algorithm reaches a worst case runtime of [Formula: see text] for acyclic graphs and [Formula: see text] for arbitrary cyclic graphs. We apply it to five different types of graphs and observe a speedup between 3-fold and 20-fold compared with a previous (asymptotically optimal) alignment algorithm. AVAILABILITY AND IMPLEMENTATION: https://github.com/maickrau/GraphAligner SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-10-01 2019-03-09 /pmc/articles/PMC6761980/ /pubmed/30851095 http://dx.doi.org/10.1093/bioinformatics/btz162 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Rautiainen, Mikko Mäkinen, Veli Marschall, Tobias Bit-parallel sequence-to-graph alignment |
title | Bit-parallel sequence-to-graph alignment |
title_full | Bit-parallel sequence-to-graph alignment |
title_fullStr | Bit-parallel sequence-to-graph alignment |
title_full_unstemmed | Bit-parallel sequence-to-graph alignment |
title_short | Bit-parallel sequence-to-graph alignment |
title_sort | bit-parallel sequence-to-graph alignment |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6761980/ https://www.ncbi.nlm.nih.gov/pubmed/30851095 http://dx.doi.org/10.1093/bioinformatics/btz162 |
work_keys_str_mv | AT rautiainenmikko bitparallelsequencetographalignment AT makinenveli bitparallelsequencetographalignment AT marschalltobias bitparallelsequencetographalignment |