Cargando…
muBLASTP: database-indexed protein sequence search on multicore CPUs
BACKGROUND: The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence se...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5096327/ https://www.ncbi.nlm.nih.gov/pubmed/27809763 http://dx.doi.org/10.1186/s12859-016-1302-4 |
_version_ | 1782465451534581760 |
---|---|
author | Zhang, Jing Misra, Sanchit Wang, Hao Feng, Wu-chun |
author_facet | Zhang, Jing Misra, Sanchit Wang, Hao Feng, Wu-chun |
author_sort | Zhang, Jing |
collection | PubMed |
description | BACKGROUND: The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. RESULTS: muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. CONCLUSIONS: With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1302-4) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5096327 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-50963272016-11-07 muBLASTP: database-indexed protein sequence search on multicore CPUs Zhang, Jing Misra, Sanchit Wang, Hao Feng, Wu-chun BMC Bioinformatics Software BACKGROUND: The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. RESULTS: muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. CONCLUSIONS: With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1302-4) contains supplementary material, which is available to authorized users. BioMed Central 2016-11-04 /pmc/articles/PMC5096327/ /pubmed/27809763 http://dx.doi.org/10.1186/s12859-016-1302-4 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Zhang, Jing Misra, Sanchit Wang, Hao Feng, Wu-chun muBLASTP: database-indexed protein sequence search on multicore CPUs |
title | muBLASTP: database-indexed protein sequence search on multicore CPUs |
title_full | muBLASTP: database-indexed protein sequence search on multicore CPUs |
title_fullStr | muBLASTP: database-indexed protein sequence search on multicore CPUs |
title_full_unstemmed | muBLASTP: database-indexed protein sequence search on multicore CPUs |
title_short | muBLASTP: database-indexed protein sequence search on multicore CPUs |
title_sort | mublastp: database-indexed protein sequence search on multicore cpus |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5096327/ https://www.ncbi.nlm.nih.gov/pubmed/27809763 http://dx.doi.org/10.1186/s12859-016-1302-4 |
work_keys_str_mv | AT zhangjing mublastpdatabaseindexedproteinsequencesearchonmulticorecpus AT misrasanchit mublastpdatabaseindexedproteinsequencesearchonmulticorecpus AT wanghao mublastpdatabaseindexedproteinsequencesearchonmulticorecpus AT fengwuchun mublastpdatabaseindexedproteinsequencesearchonmulticorecpus |