Cargando…

muBLASTP: database-indexed protein sequence search on multicore CPUs

BACKGROUND: The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence se...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Jing, Misra, Sanchit, Wang, Hao, Feng, Wu-chun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5096327/
https://www.ncbi.nlm.nih.gov/pubmed/27809763
http://dx.doi.org/10.1186/s12859-016-1302-4
_version_ 1782465451534581760
author Zhang, Jing
Misra, Sanchit
Wang, Hao
Feng, Wu-chun
author_facet Zhang, Jing
Misra, Sanchit
Wang, Hao
Feng, Wu-chun
author_sort Zhang, Jing
collection PubMed
description BACKGROUND: The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. RESULTS: muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. CONCLUSIONS: With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1302-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5096327
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50963272016-11-07 muBLASTP: database-indexed protein sequence search on multicore CPUs Zhang, Jing Misra, Sanchit Wang, Hao Feng, Wu-chun BMC Bioinformatics Software BACKGROUND: The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. RESULTS: muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. CONCLUSIONS: With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1302-4) contains supplementary material, which is available to authorized users. BioMed Central 2016-11-04 /pmc/articles/PMC5096327/ /pubmed/27809763 http://dx.doi.org/10.1186/s12859-016-1302-4 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Zhang, Jing
Misra, Sanchit
Wang, Hao
Feng, Wu-chun
muBLASTP: database-indexed protein sequence search on multicore CPUs
title muBLASTP: database-indexed protein sequence search on multicore CPUs
title_full muBLASTP: database-indexed protein sequence search on multicore CPUs
title_fullStr muBLASTP: database-indexed protein sequence search on multicore CPUs
title_full_unstemmed muBLASTP: database-indexed protein sequence search on multicore CPUs
title_short muBLASTP: database-indexed protein sequence search on multicore CPUs
title_sort mublastp: database-indexed protein sequence search on multicore cpus
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5096327/
https://www.ncbi.nlm.nih.gov/pubmed/27809763
http://dx.doi.org/10.1186/s12859-016-1302-4
work_keys_str_mv AT zhangjing mublastpdatabaseindexedproteinsequencesearchonmulticorecpus
AT misrasanchit mublastpdatabaseindexedproteinsequencesearchonmulticorecpus
AT wanghao mublastpdatabaseindexedproteinsequencesearchonmulticorecpus
AT fengwuchun mublastpdatabaseindexedproteinsequencesearchonmulticorecpus