Cargando…

GPU accelerated sequence alignment with traceback for GATK HaplotypeCaller

BACKGROUND: Pairwise sequence alignment is widely used in many biological tools and applications. Existing GPU accelerated implementations mainly focus on calculating optimal alignment score and omit identifying the optimal alignment itself. In GATK HaplotypeCaller (HC), the semi-global pairwise seq...

Descripción completa

Detalles Bibliográficos
Autores principales: Ren, Shanshan, Ahmed, Nauman, Bertels, Koen, Al-Ars, Zaid
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6456962/
https://www.ncbi.nlm.nih.gov/pubmed/30967111
http://dx.doi.org/10.1186/s12864-019-5468-9
_version_ 1783409835188944896
author Ren, Shanshan
Ahmed, Nauman
Bertels, Koen
Al-Ars, Zaid
author_facet Ren, Shanshan
Ahmed, Nauman
Bertels, Koen
Al-Ars, Zaid
author_sort Ren, Shanshan
collection PubMed
description BACKGROUND: Pairwise sequence alignment is widely used in many biological tools and applications. Existing GPU accelerated implementations mainly focus on calculating optimal alignment score and omit identifying the optimal alignment itself. In GATK HaplotypeCaller (HC), the semi-global pairwise sequence alignment with traceback has so far been difficult to accelerate effectively on GPUs. RESULTS: We first analyze the characteristics of the semi-global alignment with traceback in GATK HC and then propose a new algorithm that allows for retrieving the optimal alignment efficiently on GPUs. For the first stage, we choose intra-task parallelization model to calculate the position of the optimal alignment score and the backtracking matrix. Moreover, in the first stage, our GPU implementation also records the length of consecutive matches/mismatches in addition to lengths of consecutive insertions and deletions as in the CPU-based implementation. This helps efficiently retrieve the backtracking matrix to obtain the optimal alignment in the second stage. CONCLUSIONS: Experimental results show that our alignment kernel with traceback is up to 80x and 14.14x faster than its CPU counterpart with synthetic datasets and real datasets, respectively. When integrated into GATK HC (alongside a GPU accelerated pair-HMMs forward kernel), the overall acceleration is 2.3x faster than the baseline GATK HC implementation, and 1.34x faster than the GATK HC implementation with the integrated GPU-based pair-HMMs forward algorithm. Although the methods proposed in this paper is to improve the performance of GATK HC, they can also be used in other pairwise alignments and applications.
format Online
Article
Text
id pubmed-6456962
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64569622019-04-19 GPU accelerated sequence alignment with traceback for GATK HaplotypeCaller Ren, Shanshan Ahmed, Nauman Bertels, Koen Al-Ars, Zaid BMC Genomics Research BACKGROUND: Pairwise sequence alignment is widely used in many biological tools and applications. Existing GPU accelerated implementations mainly focus on calculating optimal alignment score and omit identifying the optimal alignment itself. In GATK HaplotypeCaller (HC), the semi-global pairwise sequence alignment with traceback has so far been difficult to accelerate effectively on GPUs. RESULTS: We first analyze the characteristics of the semi-global alignment with traceback in GATK HC and then propose a new algorithm that allows for retrieving the optimal alignment efficiently on GPUs. For the first stage, we choose intra-task parallelization model to calculate the position of the optimal alignment score and the backtracking matrix. Moreover, in the first stage, our GPU implementation also records the length of consecutive matches/mismatches in addition to lengths of consecutive insertions and deletions as in the CPU-based implementation. This helps efficiently retrieve the backtracking matrix to obtain the optimal alignment in the second stage. CONCLUSIONS: Experimental results show that our alignment kernel with traceback is up to 80x and 14.14x faster than its CPU counterpart with synthetic datasets and real datasets, respectively. When integrated into GATK HC (alongside a GPU accelerated pair-HMMs forward kernel), the overall acceleration is 2.3x faster than the baseline GATK HC implementation, and 1.34x faster than the GATK HC implementation with the integrated GPU-based pair-HMMs forward algorithm. Although the methods proposed in this paper is to improve the performance of GATK HC, they can also be used in other pairwise alignments and applications. BioMed Central 2019-04-04 /pmc/articles/PMC6456962/ /pubmed/30967111 http://dx.doi.org/10.1186/s12864-019-5468-9 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Ren, Shanshan
Ahmed, Nauman
Bertels, Koen
Al-Ars, Zaid
GPU accelerated sequence alignment with traceback for GATK HaplotypeCaller
title GPU accelerated sequence alignment with traceback for GATK HaplotypeCaller
title_full GPU accelerated sequence alignment with traceback for GATK HaplotypeCaller
title_fullStr GPU accelerated sequence alignment with traceback for GATK HaplotypeCaller
title_full_unstemmed GPU accelerated sequence alignment with traceback for GATK HaplotypeCaller
title_short GPU accelerated sequence alignment with traceback for GATK HaplotypeCaller
title_sort gpu accelerated sequence alignment with traceback for gatk haplotypecaller
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6456962/
https://www.ncbi.nlm.nih.gov/pubmed/30967111
http://dx.doi.org/10.1186/s12864-019-5468-9
work_keys_str_mv AT renshanshan gpuacceleratedsequencealignmentwithtracebackforgatkhaplotypecaller
AT ahmednauman gpuacceleratedsequencealignmentwithtracebackforgatkhaplotypecaller
AT bertelskoen gpuacceleratedsequencealignmentwithtracebackforgatkhaplotypecaller
AT alarszaid gpuacceleratedsequencealignmentwithtracebackforgatkhaplotypecaller