Cargando…

miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST

A common task in many modern bioinformatics applications is to match a set of nucleotide query sequences against a large sequence dataset. Exis-ting tools, such as BLAST, are designed to evaluate a single query at a time and can be unacceptably slow when the number of sequences in the query set is l...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, You Jung, Boyd, Andrew, Athey, Brian D., Patel, Jignesh M.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182166/
https://www.ncbi.nlm.nih.gov/pubmed/16061938
http://dx.doi.org/10.1093/nar/gki739
_version_ 1782124646995329024
author Kim, You Jung
Boyd, Andrew
Athey, Brian D.
Patel, Jignesh M.
author_facet Kim, You Jung
Boyd, Andrew
Athey, Brian D.
Patel, Jignesh M.
author_sort Kim, You Jung
collection PubMed
description A common task in many modern bioinformatics applications is to match a set of nucleotide query sequences against a large sequence dataset. Exis-ting tools, such as BLAST, are designed to evaluate a single query at a time and can be unacceptably slow when the number of sequences in the query set is large. In this paper, we present a new algorithm, called miBLAST, that evaluates such batch workloads efficiently. At the core, miBLAST employs a q-gram filtering and an index join for efficiently detecting similarity between the query sequences and database sequences. This set-oriented technique, which indexes both the query and the database sets, results in substantial performance improvements over existing methods. Our results show that miBLAST is significantly faster than BLAST in many cases. For example, miBLAST aligned 247 965 oligonucleotide sequences in the Affymetrix probe set against the Human UniGene in 1.26 days, compared with 27.27 days with BLAST (an improvement by a factor of 22). The relative performance of miBLAST increases for larger word sizes; however, it decreases for longer queries. miBLAST employs the familiar BLAST statistical model and output format, guaranteeing the same accuracy as BLAST and facilitating a seamless transition for existing BLAST users.
format Text
id pubmed-1182166
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-11821662005-08-03 miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST Kim, You Jung Boyd, Andrew Athey, Brian D. Patel, Jignesh M. Nucleic Acids Res Article A common task in many modern bioinformatics applications is to match a set of nucleotide query sequences against a large sequence dataset. Exis-ting tools, such as BLAST, are designed to evaluate a single query at a time and can be unacceptably slow when the number of sequences in the query set is large. In this paper, we present a new algorithm, called miBLAST, that evaluates such batch workloads efficiently. At the core, miBLAST employs a q-gram filtering and an index join for efficiently detecting similarity between the query sequences and database sequences. This set-oriented technique, which indexes both the query and the database sets, results in substantial performance improvements over existing methods. Our results show that miBLAST is significantly faster than BLAST in many cases. For example, miBLAST aligned 247 965 oligonucleotide sequences in the Affymetrix probe set against the Human UniGene in 1.26 days, compared with 27.27 days with BLAST (an improvement by a factor of 22). The relative performance of miBLAST increases for larger word sizes; however, it decreases for longer queries. miBLAST employs the familiar BLAST statistical model and output format, guaranteeing the same accuracy as BLAST and facilitating a seamless transition for existing BLAST users. Oxford University Press 2005 2005-08-01 /pmc/articles/PMC1182166/ /pubmed/16061938 http://dx.doi.org/10.1093/nar/gki739 Text en © The Author 2005. Published by Oxford University Press. All rights reserved
spellingShingle Article
Kim, You Jung
Boyd, Andrew
Athey, Brian D.
Patel, Jignesh M.
miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST
title miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST
title_full miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST
title_fullStr miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST
title_full_unstemmed miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST
title_short miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST
title_sort miblast: scalable evaluation of a batch of nucleotide sequence queries with blast
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182166/
https://www.ncbi.nlm.nih.gov/pubmed/16061938
http://dx.doi.org/10.1093/nar/gki739
work_keys_str_mv AT kimyoujung miblastscalableevaluationofabatchofnucleotidesequencequerieswithblast
AT boydandrew miblastscalableevaluationofabatchofnucleotidesequencequerieswithblast
AT atheybriand miblastscalableevaluationofabatchofnucleotidesequencequerieswithblast
AT pateljigneshm miblastscalableevaluationofabatchofnucleotidesequencequerieswithblast