Cargando…

Indexing Strategies for Rapid Searches of Short Words in Genome Sequences

Searching for matches between large collections of short (14–30 nucleotides) words and sequence databases comprising full genomes or transcriptomes is a common task in biological sequence analysis. We investigated the performance of simple indexing strategies for handling such tasks and developed tw...

Descripción completa

Detalles Bibliográficos
Autores principales: Iseli, Christian, Ambrosini, Giovanna, Bucher, Philipp, Jongeneel, C. Victor
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1894650/
https://www.ncbi.nlm.nih.gov/pubmed/17593978
http://dx.doi.org/10.1371/journal.pone.0000579
_version_ 1782133871691694080
author Iseli, Christian
Ambrosini, Giovanna
Bucher, Philipp
Jongeneel, C. Victor
author_facet Iseli, Christian
Ambrosini, Giovanna
Bucher, Philipp
Jongeneel, C. Victor
author_sort Iseli, Christian
collection PubMed
description Searching for matches between large collections of short (14–30 nucleotides) words and sequence databases comprising full genomes or transcriptomes is a common task in biological sequence analysis. We investigated the performance of simple indexing strategies for handling such tasks and developed two programs, fetchGWI and tagger, that index either the database or the query set. Either strategy outperforms megablast for searches with more than 10,000 probes. FetchGWI is shown to be a versatile tool for rapidly searching multiple genomes, whose performance is limited in most cases by the speed of access to the filesystem. We have made publicly available a Web interface for searching the human, mouse, and several other genomes and transcriptomes with oligonucleotide queries.
format Text
id pubmed-1894650
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-18946502007-08-23 Indexing Strategies for Rapid Searches of Short Words in Genome Sequences Iseli, Christian Ambrosini, Giovanna Bucher, Philipp Jongeneel, C. Victor PLoS One Research Article Searching for matches between large collections of short (14–30 nucleotides) words and sequence databases comprising full genomes or transcriptomes is a common task in biological sequence analysis. We investigated the performance of simple indexing strategies for handling such tasks and developed two programs, fetchGWI and tagger, that index either the database or the query set. Either strategy outperforms megablast for searches with more than 10,000 probes. FetchGWI is shown to be a versatile tool for rapidly searching multiple genomes, whose performance is limited in most cases by the speed of access to the filesystem. We have made publicly available a Web interface for searching the human, mouse, and several other genomes and transcriptomes with oligonucleotide queries. Public Library of Science 2007-06-27 /pmc/articles/PMC1894650/ /pubmed/17593978 http://dx.doi.org/10.1371/journal.pone.0000579 Text en Iseli et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Iseli, Christian
Ambrosini, Giovanna
Bucher, Philipp
Jongeneel, C. Victor
Indexing Strategies for Rapid Searches of Short Words in Genome Sequences
title Indexing Strategies for Rapid Searches of Short Words in Genome Sequences
title_full Indexing Strategies for Rapid Searches of Short Words in Genome Sequences
title_fullStr Indexing Strategies for Rapid Searches of Short Words in Genome Sequences
title_full_unstemmed Indexing Strategies for Rapid Searches of Short Words in Genome Sequences
title_short Indexing Strategies for Rapid Searches of Short Words in Genome Sequences
title_sort indexing strategies for rapid searches of short words in genome sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1894650/
https://www.ncbi.nlm.nih.gov/pubmed/17593978
http://dx.doi.org/10.1371/journal.pone.0000579
work_keys_str_mv AT iselichristian indexingstrategiesforrapidsearchesofshortwordsingenomesequences
AT ambrosinigiovanna indexingstrategiesforrapidsearchesofshortwordsingenomesequences
AT bucherphilipp indexingstrategiesforrapidsearchesofshortwordsingenomesequences
AT jongeneelcvictor indexingstrategiesforrapidsearchesofshortwordsingenomesequences