Cargando…

MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence

BACKGROUND: Programs based on hash tables and Burrows-Wheeler are very fast for mapping short reads to genomes but have low accuracy in the presence of mismatches and gaps. Such reads can be aligned accurately with the Smith-Waterman algorithm but it can take hours and days to map millions of reads...

Descripción completa

Detalles Bibliográficos
Autores principales: Turki, Turki, Roshan, Usman
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4289234/
https://www.ncbi.nlm.nih.gov/pubmed/25398475
http://dx.doi.org/10.1186/1471-2164-15-969
_version_ 1782352072138555392
author Turki, Turki
Roshan, Usman
author_facet Turki, Turki
Roshan, Usman
author_sort Turki, Turki
collection PubMed
description BACKGROUND: Programs based on hash tables and Burrows-Wheeler are very fast for mapping short reads to genomes but have low accuracy in the presence of mismatches and gaps. Such reads can be aligned accurately with the Smith-Waterman algorithm but it can take hours and days to map millions of reads even for bacteria genomes. RESULTS: We introduce a GPU program called MaxSSmap with the aim of achieving comparable accuracy to Smith-Waterman but with faster runtimes. Similar to most programs MaxSSmap identifies a local region of the genome followed by exact alignment. Instead of using hash tables or Burrows-Wheeler in the first part, MaxSSmap calculates maximum scoring subsequence score between the read and disjoint fragments of the genome in parallel on a GPU and selects the highest scoring fragment for exact alignment. We evaluate MaxSSmap’s accuracy and runtime when mapping simulated Illumina E.coli and human chromosome one reads of different lengths and 10% to 30% mismatches with gaps to the E.coli genome and human chromosome one. We also demonstrate applications on real data by mapping ancient horse DNA reads to modern genomes and unmapped paired reads from NA12878 in 1000 genomes. CONCLUSIONS: We show that MaxSSmap attains comparable high accuracy and low error to fast Smith-Waterman programs yet has much lower runtimes. We show that MaxSSmap can map reads rejected by BWA and NextGenMap with high accuracy and low error much faster than if Smith-Waterman were used. On short read lengths of 36 and 51 both MaxSSmap and Smith-Waterman have lower accuracy compared to at higher lengths. On real data MaxSSmap produces many alignments with high score and mapping quality that are not given by NextGenMap and BWA. The MaxSSmap source code in CUDA and OpenCL is freely available from http://www.cs.njit.edu/usman/MaxSSmap. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-969) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4289234
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42892342015-01-11 MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence Turki, Turki Roshan, Usman BMC Genomics Software BACKGROUND: Programs based on hash tables and Burrows-Wheeler are very fast for mapping short reads to genomes but have low accuracy in the presence of mismatches and gaps. Such reads can be aligned accurately with the Smith-Waterman algorithm but it can take hours and days to map millions of reads even for bacteria genomes. RESULTS: We introduce a GPU program called MaxSSmap with the aim of achieving comparable accuracy to Smith-Waterman but with faster runtimes. Similar to most programs MaxSSmap identifies a local region of the genome followed by exact alignment. Instead of using hash tables or Burrows-Wheeler in the first part, MaxSSmap calculates maximum scoring subsequence score between the read and disjoint fragments of the genome in parallel on a GPU and selects the highest scoring fragment for exact alignment. We evaluate MaxSSmap’s accuracy and runtime when mapping simulated Illumina E.coli and human chromosome one reads of different lengths and 10% to 30% mismatches with gaps to the E.coli genome and human chromosome one. We also demonstrate applications on real data by mapping ancient horse DNA reads to modern genomes and unmapped paired reads from NA12878 in 1000 genomes. CONCLUSIONS: We show that MaxSSmap attains comparable high accuracy and low error to fast Smith-Waterman programs yet has much lower runtimes. We show that MaxSSmap can map reads rejected by BWA and NextGenMap with high accuracy and low error much faster than if Smith-Waterman were used. On short read lengths of 36 and 51 both MaxSSmap and Smith-Waterman have lower accuracy compared to at higher lengths. On real data MaxSSmap produces many alignments with high score and mapping quality that are not given by NextGenMap and BWA. The MaxSSmap source code in CUDA and OpenCL is freely available from http://www.cs.njit.edu/usman/MaxSSmap. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-969) contains supplementary material, which is available to authorized users. BioMed Central 2014-11-15 /pmc/articles/PMC4289234/ /pubmed/25398475 http://dx.doi.org/10.1186/1471-2164-15-969 Text en © Turki and Roshan; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Turki, Turki
Roshan, Usman
MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence
title MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence
title_full MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence
title_fullStr MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence
title_full_unstemmed MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence
title_short MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence
title_sort maxssmap: a gpu program for mapping divergent short reads to genomes with the maximum scoring subsequence
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4289234/
https://www.ncbi.nlm.nih.gov/pubmed/25398475
http://dx.doi.org/10.1186/1471-2164-15-969
work_keys_str_mv AT turkiturki maxssmapagpuprogramformappingdivergentshortreadstogenomeswiththemaximumscoringsubsequence
AT roshanusman maxssmapagpuprogramformappingdivergentshortreadstogenomeswiththemaximumscoringsubsequence