Cargando…

Pairwise alignment of nucleotide sequences using maximal exact matches

BACKGROUND: Pairwise alignment of short DNA sequences with affine-gap scoring is a common processing step performed in a range of bioinformatics analyses. Dynamic programming (i.e. Smith-Waterman algorithm) is widely used for this purpose. Despite using data level parallelisation, pairwise alignment...

Descripción completa

Detalles Bibliográficos
Autores principales: Bayat, Arash, Gaëta, Bruno, Ignjatovic, Aleksandar, Parameswaran, Sri
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6528274/
https://www.ncbi.nlm.nih.gov/pubmed/31113356
http://dx.doi.org/10.1186/s12859-019-2827-0
_version_ 1783420181017526272
author Bayat, Arash
Gaëta, Bruno
Ignjatovic, Aleksandar
Parameswaran, Sri
author_facet Bayat, Arash
Gaëta, Bruno
Ignjatovic, Aleksandar
Parameswaran, Sri
author_sort Bayat, Arash
collection PubMed
description BACKGROUND: Pairwise alignment of short DNA sequences with affine-gap scoring is a common processing step performed in a range of bioinformatics analyses. Dynamic programming (i.e. Smith-Waterman algorithm) is widely used for this purpose. Despite using data level parallelisation, pairwise alignment consumes much time. There are faster alignment algorithms but they suffer from the lack of accuracy. RESULTS: In this paper, we present MEM-Align, a fast semi-global alignment algorithm for short DNA sequences that allows for affine-gap scoring and exploit sequence similarity. In contrast to traditional alignment method (such as Smith-Waterman) where individual symbols are aligned, MEM-Align extracts Maximal Exact Matches (MEMs) using a bit-level parallel method and then looks for a subset of MEMs that forms the alignment using a novel dynamic programming method. MEM-Align tries to mimic alignment produced by Smith-Waterman. As a result, for 99.9% of input sequence pair, the computed alignment score is identical to the alignment score computed by Smith-Waterman. Yet MEM-Align is up to 14.5 times faster than the Smith-Waterman algorithm. Fast run-time is achieved by: (a) using a bit-level parallel method to extract MEMs; (b) processing MEMs rather than individual symbols; and, (c) applying heuristics. CONCLUSIONS: MEM-Align is a potential candidate to replace other pairwise alignment algorithms used in processes such as DNA read-mapping and Variant-Calling. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2827-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6528274
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-65282742019-05-28 Pairwise alignment of nucleotide sequences using maximal exact matches Bayat, Arash Gaëta, Bruno Ignjatovic, Aleksandar Parameswaran, Sri BMC Bioinformatics Methodology Article BACKGROUND: Pairwise alignment of short DNA sequences with affine-gap scoring is a common processing step performed in a range of bioinformatics analyses. Dynamic programming (i.e. Smith-Waterman algorithm) is widely used for this purpose. Despite using data level parallelisation, pairwise alignment consumes much time. There are faster alignment algorithms but they suffer from the lack of accuracy. RESULTS: In this paper, we present MEM-Align, a fast semi-global alignment algorithm for short DNA sequences that allows for affine-gap scoring and exploit sequence similarity. In contrast to traditional alignment method (such as Smith-Waterman) where individual symbols are aligned, MEM-Align extracts Maximal Exact Matches (MEMs) using a bit-level parallel method and then looks for a subset of MEMs that forms the alignment using a novel dynamic programming method. MEM-Align tries to mimic alignment produced by Smith-Waterman. As a result, for 99.9% of input sequence pair, the computed alignment score is identical to the alignment score computed by Smith-Waterman. Yet MEM-Align is up to 14.5 times faster than the Smith-Waterman algorithm. Fast run-time is achieved by: (a) using a bit-level parallel method to extract MEMs; (b) processing MEMs rather than individual symbols; and, (c) applying heuristics. CONCLUSIONS: MEM-Align is a potential candidate to replace other pairwise alignment algorithms used in processes such as DNA read-mapping and Variant-Calling. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2827-0) contains supplementary material, which is available to authorized users. BioMed Central 2019-05-21 /pmc/articles/PMC6528274/ /pubmed/31113356 http://dx.doi.org/10.1186/s12859-019-2827-0 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Bayat, Arash
Gaëta, Bruno
Ignjatovic, Aleksandar
Parameswaran, Sri
Pairwise alignment of nucleotide sequences using maximal exact matches
title Pairwise alignment of nucleotide sequences using maximal exact matches
title_full Pairwise alignment of nucleotide sequences using maximal exact matches
title_fullStr Pairwise alignment of nucleotide sequences using maximal exact matches
title_full_unstemmed Pairwise alignment of nucleotide sequences using maximal exact matches
title_short Pairwise alignment of nucleotide sequences using maximal exact matches
title_sort pairwise alignment of nucleotide sequences using maximal exact matches
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6528274/
https://www.ncbi.nlm.nih.gov/pubmed/31113356
http://dx.doi.org/10.1186/s12859-019-2827-0
work_keys_str_mv AT bayatarash pairwisealignmentofnucleotidesequencesusingmaximalexactmatches
AT gaetabruno pairwisealignmentofnucleotidesequencesusingmaximalexactmatches
AT ignjatovicaleksandar pairwisealignmentofnucleotidesequencesusingmaximalexactmatches
AT parameswaransri pairwisealignmentofnucleotidesequencesusingmaximalexactmatches