Cargando…

pblat: a multithread blat algorithm speeding up aligning sequences to genomes

BACKGROUND: The blat is a widely used sequence alignment tool. It is especially useful for aligning long sequences and gapped mapping, which cannot be performed properly by other fast sequence mappers designed for short reads. However, the blat tool is single threaded and when used to map whole geno...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Meng, Kong, Lei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6334396/
https://www.ncbi.nlm.nih.gov/pubmed/30646844
http://dx.doi.org/10.1186/s12859-019-2597-8
_version_ 1783387706138558464
author Wang, Meng
Kong, Lei
author_facet Wang, Meng
Kong, Lei
author_sort Wang, Meng
collection PubMed
description BACKGROUND: The blat is a widely used sequence alignment tool. It is especially useful for aligning long sequences and gapped mapping, which cannot be performed properly by other fast sequence mappers designed for short reads. However, the blat tool is single threaded and when used to map whole genome or whole transcriptome sequences to reference genomes this program can take days to finish, making it unsuitable for large scale sequencing projects and iterative analysis. Here, we present pblat (parallel blat), a parallelized blat algorithm with multithread and cluster computing support, which functions to rapidly fine map large scale DNA/RNA sequences against genomes. RESULTS: The pblat algorithm takes advantage of modern multicore processors and significantly reduces the run time with the number of threads used. pblat utilizes almost equal amount of memory as when running blat. The results generated by pblat are identical with those generated by blat. The pblat tool is easy to install and can run on Linux and Mac OS systems. In addition, we provide a cluster version of pblat (pblat-cluster) running on computing clusters with MPI support. CONCLUSION: pblat is open source and free available for non-commercial users. It is easy to install and easy to use. pblat and pblat-cluster would facilitate the high-throughput mapping of large scale genomic and transcript sequences to reference genomes with both high speed and high precision.
format Online
Article
Text
id pubmed-6334396
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63343962019-01-23 pblat: a multithread blat algorithm speeding up aligning sequences to genomes Wang, Meng Kong, Lei BMC Bioinformatics Software BACKGROUND: The blat is a widely used sequence alignment tool. It is especially useful for aligning long sequences and gapped mapping, which cannot be performed properly by other fast sequence mappers designed for short reads. However, the blat tool is single threaded and when used to map whole genome or whole transcriptome sequences to reference genomes this program can take days to finish, making it unsuitable for large scale sequencing projects and iterative analysis. Here, we present pblat (parallel blat), a parallelized blat algorithm with multithread and cluster computing support, which functions to rapidly fine map large scale DNA/RNA sequences against genomes. RESULTS: The pblat algorithm takes advantage of modern multicore processors and significantly reduces the run time with the number of threads used. pblat utilizes almost equal amount of memory as when running blat. The results generated by pblat are identical with those generated by blat. The pblat tool is easy to install and can run on Linux and Mac OS systems. In addition, we provide a cluster version of pblat (pblat-cluster) running on computing clusters with MPI support. CONCLUSION: pblat is open source and free available for non-commercial users. It is easy to install and easy to use. pblat and pblat-cluster would facilitate the high-throughput mapping of large scale genomic and transcript sequences to reference genomes with both high speed and high precision. BioMed Central 2019-01-15 /pmc/articles/PMC6334396/ /pubmed/30646844 http://dx.doi.org/10.1186/s12859-019-2597-8 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Wang, Meng
Kong, Lei
pblat: a multithread blat algorithm speeding up aligning sequences to genomes
title pblat: a multithread blat algorithm speeding up aligning sequences to genomes
title_full pblat: a multithread blat algorithm speeding up aligning sequences to genomes
title_fullStr pblat: a multithread blat algorithm speeding up aligning sequences to genomes
title_full_unstemmed pblat: a multithread blat algorithm speeding up aligning sequences to genomes
title_short pblat: a multithread blat algorithm speeding up aligning sequences to genomes
title_sort pblat: a multithread blat algorithm speeding up aligning sequences to genomes
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6334396/
https://www.ncbi.nlm.nih.gov/pubmed/30646844
http://dx.doi.org/10.1186/s12859-019-2597-8
work_keys_str_mv AT wangmeng pblatamultithreadblatalgorithmspeedingupaligningsequencestogenomes
AT konglei pblatamultithreadblatalgorithmspeedingupaligningsequencestogenomes