Cargando…

Introducing difference recurrence relations for faster semi-global alignment of long sequences

BACKGROUND: The read length of single-molecule DNA sequencers is reaching 1 Mb. Popular alignment software tools widely used for analyzing such long reads often take advantage of single-instruction multiple-data (SIMD) operations to accelerate calculation of dynamic programming (DP) matrices in the...

Descripción completa

Detalles Bibliográficos
Autores principales: Suzuki, Hajime, Kasahara, Masahiro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5836832/
https://www.ncbi.nlm.nih.gov/pubmed/29504909
http://dx.doi.org/10.1186/s12859-018-2014-8
_version_ 1783304013555433472
author Suzuki, Hajime
Kasahara, Masahiro
author_facet Suzuki, Hajime
Kasahara, Masahiro
author_sort Suzuki, Hajime
collection PubMed
description BACKGROUND: The read length of single-molecule DNA sequencers is reaching 1 Mb. Popular alignment software tools widely used for analyzing such long reads often take advantage of single-instruction multiple-data (SIMD) operations to accelerate calculation of dynamic programming (DP) matrices in the Smith–Waterman–Gotoh (SWG) algorithm with a fixed alignment start position at the origin. Nonetheless, 16-bit or 32-bit integers are necessary for storing the values in a DP matrix when sequences to be aligned are long; this situation hampers the use of the full SIMD width of modern processors. RESULTS: We proposed a faster semi-global alignment algorithm, “difference recurrence relations,” that runs more rapidly than the state-of-the-art algorithm by a factor of 2.1. Instead of calculating and storing all the values in a DP matrix directly, our algorithm computes and stores mainly the differences between the values of adjacent cells in the matrix. Although the SWG algorithm and our algorithm can output exactly the same result, our algorithm mainly involves 8-bit integer operations, enabling us to exploit the full width of SIMD operations (e.g., 32) on modern processors. We also developed a library, libgaba, so that developers can easily integrate our algorithm into alignment programs. CONCLUSIONS: Our novel algorithm and optimized library implementation will facilitate accelerating nucleotide long-read analysis algorithms that use pairwise alignment stages. The library is implemented in the C programming language and available at https://github.com/ocxtal/libgaba. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2014-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5836832
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-58368322018-03-07 Introducing difference recurrence relations for faster semi-global alignment of long sequences Suzuki, Hajime Kasahara, Masahiro BMC Bioinformatics Methodology BACKGROUND: The read length of single-molecule DNA sequencers is reaching 1 Mb. Popular alignment software tools widely used for analyzing such long reads often take advantage of single-instruction multiple-data (SIMD) operations to accelerate calculation of dynamic programming (DP) matrices in the Smith–Waterman–Gotoh (SWG) algorithm with a fixed alignment start position at the origin. Nonetheless, 16-bit or 32-bit integers are necessary for storing the values in a DP matrix when sequences to be aligned are long; this situation hampers the use of the full SIMD width of modern processors. RESULTS: We proposed a faster semi-global alignment algorithm, “difference recurrence relations,” that runs more rapidly than the state-of-the-art algorithm by a factor of 2.1. Instead of calculating and storing all the values in a DP matrix directly, our algorithm computes and stores mainly the differences between the values of adjacent cells in the matrix. Although the SWG algorithm and our algorithm can output exactly the same result, our algorithm mainly involves 8-bit integer operations, enabling us to exploit the full width of SIMD operations (e.g., 32) on modern processors. We also developed a library, libgaba, so that developers can easily integrate our algorithm into alignment programs. CONCLUSIONS: Our novel algorithm and optimized library implementation will facilitate accelerating nucleotide long-read analysis algorithms that use pairwise alignment stages. The library is implemented in the C programming language and available at https://github.com/ocxtal/libgaba. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2014-8) contains supplementary material, which is available to authorized users. BioMed Central 2018-02-19 /pmc/articles/PMC5836832/ /pubmed/29504909 http://dx.doi.org/10.1186/s12859-018-2014-8 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Suzuki, Hajime
Kasahara, Masahiro
Introducing difference recurrence relations for faster semi-global alignment of long sequences
title Introducing difference recurrence relations for faster semi-global alignment of long sequences
title_full Introducing difference recurrence relations for faster semi-global alignment of long sequences
title_fullStr Introducing difference recurrence relations for faster semi-global alignment of long sequences
title_full_unstemmed Introducing difference recurrence relations for faster semi-global alignment of long sequences
title_short Introducing difference recurrence relations for faster semi-global alignment of long sequences
title_sort introducing difference recurrence relations for faster semi-global alignment of long sequences
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5836832/
https://www.ncbi.nlm.nih.gov/pubmed/29504909
http://dx.doi.org/10.1186/s12859-018-2014-8
work_keys_str_mv AT suzukihajime introducingdifferencerecurrencerelationsforfastersemiglobalalignmentoflongsequences
AT kasaharamasahiro introducingdifferencerecurrencerelationsforfastersemiglobalalignmentoflongsequences