Cargando…

Sequence Search Algorithms for Single Pass Sequence Identification: Does One Size Fit All?

Bioinformatic tools have become essential to biologists in their quest to understand the vast quantities of sequence data, and now whole genomes, which are being produced at an ever increasing rate. Much of these sequence data are single-pass sequences, such as sample sequences from organisms closel...

Descripción completa

Detalles Bibliográficos
Autores principales:	Woodwark, K. Cara, Hubbard, Simon J., Oliver, Stephen G.
Formato:	Texto
Lenguaje:	English
Publicado:	Hindawi Publishing Corporation 2001
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2447189/ https://www.ncbi.nlm.nih.gov/pubmed/18628895 http://dx.doi.org/10.1002/cfg.61

_version_	1782156878404386816
author	Woodwark, K. Cara Hubbard, Simon J. Oliver, Stephen G.
author_facet	Woodwark, K. Cara Hubbard, Simon J. Oliver, Stephen G.
author_sort	Woodwark, K. Cara
collection	PubMed
description	Bioinformatic tools have become essential to biologists in their quest to understand the vast quantities of sequence data, and now whole genomes, which are being produced at an ever increasing rate. Much of these sequence data are single-pass sequences, such as sample sequences from organisms closely related to other organisms of interest which have already been sequenced, or cDNAs or expressed sequence tags (ESTs). These single-pass sequences often contain errors, including frameshifts, which complicate the identification of homologues, especially at the protein level. Therefore, sequence searches with this type of data are often performed at the nucleotide level. The most commonly used sequence search algorithms for the identification of homologues are Washington University’s and the National Center for Biotechnology Information's (NCBI) versions of the BLAST suites of tools, which are to be found on websites all over the world. The work reported here examines the use of these tools for comparing sample sequence datasets to a known genome. It shows that care must be taken when choosing the parameters to use with the BLAST algorithms. NCBI’s version of gapped BLASTn gives much shorter, and sometimes different, top alignments to those found using Washington University’s version of BLASTn (which also allows for gaps), when both are used with their default parameters. Most of the differences in performance were found to be due to the choices of default parameters rather than underlying differences between the two algorithms. Washington University’s version, used with defaults, compares very favourably with the results obtained using the accurate but computationally intensive Smith–Waterman algorithm.
format	Text
id	pubmed-2447189
institution	National Center for Biotechnology Information
language	English
publishDate	2001
publisher	Hindawi Publishing Corporation
record_format	MEDLINE/PubMed
spelling	pubmed-24471892008-07-14 Sequence Search Algorithms for Single Pass Sequence Identification: Does One Size Fit All? Woodwark, K. Cara Hubbard, Simon J. Oliver, Stephen G. Comp Funct Genomics Research Article Bioinformatic tools have become essential to biologists in their quest to understand the vast quantities of sequence data, and now whole genomes, which are being produced at an ever increasing rate. Much of these sequence data are single-pass sequences, such as sample sequences from organisms closely related to other organisms of interest which have already been sequenced, or cDNAs or expressed sequence tags (ESTs). These single-pass sequences often contain errors, including frameshifts, which complicate the identification of homologues, especially at the protein level. Therefore, sequence searches with this type of data are often performed at the nucleotide level. The most commonly used sequence search algorithms for the identification of homologues are Washington University’s and the National Center for Biotechnology Information's (NCBI) versions of the BLAST suites of tools, which are to be found on websites all over the world. The work reported here examines the use of these tools for comparing sample sequence datasets to a known genome. It shows that care must be taken when choosing the parameters to use with the BLAST algorithms. NCBI’s version of gapped BLASTn gives much shorter, and sometimes different, top alignments to those found using Washington University’s version of BLASTn (which also allows for gaps), when both are used with their default parameters. Most of the differences in performance were found to be due to the choices of default parameters rather than underlying differences between the two algorithms. Washington University’s version, used with defaults, compares very favourably with the results obtained using the accurate but computationally intensive Smith–Waterman algorithm. Hindawi Publishing Corporation 2001-02 /pmc/articles/PMC2447189/ /pubmed/18628895 http://dx.doi.org/10.1002/cfg.61 Text en Copyright © 2001 Hindawi Publishing Corporation. http://creativecommons.org/licenses/by/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Woodwark, K. Cara Hubbard, Simon J. Oliver, Stephen G. Sequence Search Algorithms for Single Pass Sequence Identification: Does One Size Fit All?
title	Sequence Search Algorithms for Single Pass Sequence Identification: Does One Size Fit All?
title_full	Sequence Search Algorithms for Single Pass Sequence Identification: Does One Size Fit All?
title_fullStr	Sequence Search Algorithms for Single Pass Sequence Identification: Does One Size Fit All?
title_full_unstemmed	Sequence Search Algorithms for Single Pass Sequence Identification: Does One Size Fit All?
title_short	Sequence Search Algorithms for Single Pass Sequence Identification: Does One Size Fit All?
title_sort	sequence search algorithms for single pass sequence identification: does one size fit all?
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2447189/ https://www.ncbi.nlm.nih.gov/pubmed/18628895 http://dx.doi.org/10.1002/cfg.61
work_keys_str_mv	AT woodwarkkcara sequencesearchalgorithmsforsinglepasssequenceidentificationdoesonesizefitall AT hubbardsimonj sequencesearchalgorithmsforsinglepasssequenceidentificationdoesonesizefitall AT oliverstepheng sequencesearchalgorithmsforsinglepasssequenceidentificationdoesonesizefitall

Sequence Search Algorithms for Single Pass Sequence Identification: Does One Size Fit All?

Ejemplares similares