Cargando…

iBLAST: Incremental BLAST of new sequences via automated e-value correction

Search results from local alignment search tools use statistical scores that are sensitive to the size of the database to report the quality of the result. For example, NCBI BLAST reports the best matches using similarity scores and expect values (i.e., e-values) calculated against the database size...

Descripción completa

Detalles Bibliográficos
Autores principales: Dash, Sajal, Rahman, Sarthok Rasique, Hines, Heather M., Feng, Wu-chun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8062096/
https://www.ncbi.nlm.nih.gov/pubmed/33886589
http://dx.doi.org/10.1371/journal.pone.0249410
_version_ 1783681697207812096
author Dash, Sajal
Rahman, Sarthok Rasique
Hines, Heather M.
Feng, Wu-chun
author_facet Dash, Sajal
Rahman, Sarthok Rasique
Hines, Heather M.
Feng, Wu-chun
author_sort Dash, Sajal
collection PubMed
description Search results from local alignment search tools use statistical scores that are sensitive to the size of the database to report the quality of the result. For example, NCBI BLAST reports the best matches using similarity scores and expect values (i.e., e-values) calculated against the database size. Given the astronomical growth in genomics data throughout a genomic research investigation, sequence databases grow as new sequences are continuously being added to these databases. As a consequence, the results (e.g., best hits) and associated statistics (e.g., e-values) for a specific set of queries may change over the course of a genomic investigation. Thus, to update the results of a previously conducted BLAST search to find the best matches on an updated database, scientists must currently rerun the BLAST search against the entire updated database, which translates into irrecoverable and, in turn, wasted execution time, money, and computational resources. To address this issue, we devise a novel and efficient method to redeem past BLAST searches by introducing iBLAST. iBLAST leverages previous BLAST search results to conduct the same query search but only on the incremental (i.e., newly added) part of the database, recomputes the associated critical statistics such as e-values, and combines these results to produce updated search results. Our experimental results and fidelity analyses show that iBLAST delivers search results that are identical to NCBI BLAST at a substantially reduced computational cost, i.e., iBLAST performs (1 + δ)/δ times faster than NCBI BLAST, where δ represents the fraction of database growth. We then present three different use cases to demonstrate that iBLAST can enable efficient biological discovery at a much faster speed with a substantially reduced computational cost.
format Online
Article
Text
id pubmed-8062096
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-80620962021-05-04 iBLAST: Incremental BLAST of new sequences via automated e-value correction Dash, Sajal Rahman, Sarthok Rasique Hines, Heather M. Feng, Wu-chun PLoS One Research Article Search results from local alignment search tools use statistical scores that are sensitive to the size of the database to report the quality of the result. For example, NCBI BLAST reports the best matches using similarity scores and expect values (i.e., e-values) calculated against the database size. Given the astronomical growth in genomics data throughout a genomic research investigation, sequence databases grow as new sequences are continuously being added to these databases. As a consequence, the results (e.g., best hits) and associated statistics (e.g., e-values) for a specific set of queries may change over the course of a genomic investigation. Thus, to update the results of a previously conducted BLAST search to find the best matches on an updated database, scientists must currently rerun the BLAST search against the entire updated database, which translates into irrecoverable and, in turn, wasted execution time, money, and computational resources. To address this issue, we devise a novel and efficient method to redeem past BLAST searches by introducing iBLAST. iBLAST leverages previous BLAST search results to conduct the same query search but only on the incremental (i.e., newly added) part of the database, recomputes the associated critical statistics such as e-values, and combines these results to produce updated search results. Our experimental results and fidelity analyses show that iBLAST delivers search results that are identical to NCBI BLAST at a substantially reduced computational cost, i.e., iBLAST performs (1 + δ)/δ times faster than NCBI BLAST, where δ represents the fraction of database growth. We then present three different use cases to demonstrate that iBLAST can enable efficient biological discovery at a much faster speed with a substantially reduced computational cost. Public Library of Science 2021-04-22 /pmc/articles/PMC8062096/ /pubmed/33886589 http://dx.doi.org/10.1371/journal.pone.0249410 Text en © 2021 Dash et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Dash, Sajal
Rahman, Sarthok Rasique
Hines, Heather M.
Feng, Wu-chun
iBLAST: Incremental BLAST of new sequences via automated e-value correction
title iBLAST: Incremental BLAST of new sequences via automated e-value correction
title_full iBLAST: Incremental BLAST of new sequences via automated e-value correction
title_fullStr iBLAST: Incremental BLAST of new sequences via automated e-value correction
title_full_unstemmed iBLAST: Incremental BLAST of new sequences via automated e-value correction
title_short iBLAST: Incremental BLAST of new sequences via automated e-value correction
title_sort iblast: incremental blast of new sequences via automated e-value correction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8062096/
https://www.ncbi.nlm.nih.gov/pubmed/33886589
http://dx.doi.org/10.1371/journal.pone.0249410
work_keys_str_mv AT dashsajal iblastincrementalblastofnewsequencesviaautomatedevaluecorrection
AT rahmansarthokrasique iblastincrementalblastofnewsequencesviaautomatedevaluecorrection
AT hinesheatherm iblastincrementalblastofnewsequencesviaautomatedevaluecorrection
AT fengwuchun iblastincrementalblastofnewsequencesviaautomatedevaluecorrection