Cargando…

miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST

A common task in many modern bioinformatics applications is to match a set of nucleotide query sequences against a large sequence dataset. Exis-ting tools, such as BLAST, are designed to evaluate a single query at a time and can be unacceptably slow when the number of sequences in the query set is l...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kim, You Jung, Boyd, Andrew, Athey, Brian D., Patel, Jignesh M.
Formato:	Texto
Lenguaje:	English
Publicado:	Oxford University Press 2005
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182166/ https://www.ncbi.nlm.nih.gov/pubmed/16061938 http://dx.doi.org/10.1093/nar/gki739

_version_	1782124646995329024
author	Kim, You Jung Boyd, Andrew Athey, Brian D. Patel, Jignesh M.
author_facet	Kim, You Jung Boyd, Andrew Athey, Brian D. Patel, Jignesh M.
author_sort	Kim, You Jung
collection	PubMed
description	A common task in many modern bioinformatics applications is to match a set of nucleotide query sequences against a large sequence dataset. Exis-ting tools, such as BLAST, are designed to evaluate a single query at a time and can be unacceptably slow when the number of sequences in the query set is large. In this paper, we present a new algorithm, called miBLAST, that evaluates such batch workloads efficiently. At the core, miBLAST employs a q-gram filtering and an index join for efficiently detecting similarity between the query sequences and database sequences. This set-oriented technique, which indexes both the query and the database sets, results in substantial performance improvements over existing methods. Our results show that miBLAST is significantly faster than BLAST in many cases. For example, miBLAST aligned 247 965 oligonucleotide sequences in the Affymetrix probe set against the Human UniGene in 1.26 days, compared with 27.27 days with BLAST (an improvement by a factor of 22). The relative performance of miBLAST increases for larger word sizes; however, it decreases for longer queries. miBLAST employs the familiar BLAST statistical model and output format, guaranteeing the same accuracy as BLAST and facilitating a seamless transition for existing BLAST users.
format	Text
id	pubmed-1182166
institution	National Center for Biotechnology Information
language	English
publishDate	2005
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-11821662005-08-03 miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST Kim, You Jung Boyd, Andrew Athey, Brian D. Patel, Jignesh M. Nucleic Acids Res Article A common task in many modern bioinformatics applications is to match a set of nucleotide query sequences against a large sequence dataset. Exis-ting tools, such as BLAST, are designed to evaluate a single query at a time and can be unacceptably slow when the number of sequences in the query set is large. In this paper, we present a new algorithm, called miBLAST, that evaluates such batch workloads efficiently. At the core, miBLAST employs a q-gram filtering and an index join for efficiently detecting similarity between the query sequences and database sequences. This set-oriented technique, which indexes both the query and the database sets, results in substantial performance improvements over existing methods. Our results show that miBLAST is significantly faster than BLAST in many cases. For example, miBLAST aligned 247 965 oligonucleotide sequences in the Affymetrix probe set against the Human UniGene in 1.26 days, compared with 27.27 days with BLAST (an improvement by a factor of 22). The relative performance of miBLAST increases for larger word sizes; however, it decreases for longer queries. miBLAST employs the familiar BLAST statistical model and output format, guaranteeing the same accuracy as BLAST and facilitating a seamless transition for existing BLAST users. Oxford University Press 2005 2005-08-01 /pmc/articles/PMC1182166/ /pubmed/16061938 http://dx.doi.org/10.1093/nar/gki739 Text en © The Author 2005. Published by Oxford University Press. All rights reserved
spellingShingle	Article Kim, You Jung Boyd, Andrew Athey, Brian D. Patel, Jignesh M. miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST
title	miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST
title_full	miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST
title_fullStr	miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST
title_full_unstemmed	miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST
title_short	miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST
title_sort	miblast: scalable evaluation of a batch of nucleotide sequence queries with blast
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182166/ https://www.ncbi.nlm.nih.gov/pubmed/16061938 http://dx.doi.org/10.1093/nar/gki739
work_keys_str_mv	AT kimyoujung miblastscalableevaluationofabatchofnucleotidesequencequerieswithblast AT boydandrew miblastscalableevaluationofabatchofnucleotidesequencequerieswithblast AT atheybriand miblastscalableevaluationofabatchofnucleotidesequencequerieswithblast AT pateljigneshm miblastscalableevaluationofabatchofnucleotidesequencequerieswithblast

miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST

Ejemplares similares