Cargando…
Indexing Strategies for Rapid Searches of Short Words in Genome Sequences
Searching for matches between large collections of short (14–30 nucleotides) words and sequence databases comprising full genomes or transcriptomes is a common task in biological sequence analysis. We investigated the performance of simple indexing strategies for handling such tasks and developed tw...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1894650/ https://www.ncbi.nlm.nih.gov/pubmed/17593978 http://dx.doi.org/10.1371/journal.pone.0000579 |
_version_ | 1782133871691694080 |
---|---|
author | Iseli, Christian Ambrosini, Giovanna Bucher, Philipp Jongeneel, C. Victor |
author_facet | Iseli, Christian Ambrosini, Giovanna Bucher, Philipp Jongeneel, C. Victor |
author_sort | Iseli, Christian |
collection | PubMed |
description | Searching for matches between large collections of short (14–30 nucleotides) words and sequence databases comprising full genomes or transcriptomes is a common task in biological sequence analysis. We investigated the performance of simple indexing strategies for handling such tasks and developed two programs, fetchGWI and tagger, that index either the database or the query set. Either strategy outperforms megablast for searches with more than 10,000 probes. FetchGWI is shown to be a versatile tool for rapidly searching multiple genomes, whose performance is limited in most cases by the speed of access to the filesystem. We have made publicly available a Web interface for searching the human, mouse, and several other genomes and transcriptomes with oligonucleotide queries. |
format | Text |
id | pubmed-1894650 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-18946502007-08-23 Indexing Strategies for Rapid Searches of Short Words in Genome Sequences Iseli, Christian Ambrosini, Giovanna Bucher, Philipp Jongeneel, C. Victor PLoS One Research Article Searching for matches between large collections of short (14–30 nucleotides) words and sequence databases comprising full genomes or transcriptomes is a common task in biological sequence analysis. We investigated the performance of simple indexing strategies for handling such tasks and developed two programs, fetchGWI and tagger, that index either the database or the query set. Either strategy outperforms megablast for searches with more than 10,000 probes. FetchGWI is shown to be a versatile tool for rapidly searching multiple genomes, whose performance is limited in most cases by the speed of access to the filesystem. We have made publicly available a Web interface for searching the human, mouse, and several other genomes and transcriptomes with oligonucleotide queries. Public Library of Science 2007-06-27 /pmc/articles/PMC1894650/ /pubmed/17593978 http://dx.doi.org/10.1371/journal.pone.0000579 Text en Iseli et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Iseli, Christian Ambrosini, Giovanna Bucher, Philipp Jongeneel, C. Victor Indexing Strategies for Rapid Searches of Short Words in Genome Sequences |
title | Indexing Strategies for Rapid Searches of Short Words in Genome Sequences |
title_full | Indexing Strategies for Rapid Searches of Short Words in Genome Sequences |
title_fullStr | Indexing Strategies for Rapid Searches of Short Words in Genome Sequences |
title_full_unstemmed | Indexing Strategies for Rapid Searches of Short Words in Genome Sequences |
title_short | Indexing Strategies for Rapid Searches of Short Words in Genome Sequences |
title_sort | indexing strategies for rapid searches of short words in genome sequences |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1894650/ https://www.ncbi.nlm.nih.gov/pubmed/17593978 http://dx.doi.org/10.1371/journal.pone.0000579 |
work_keys_str_mv | AT iselichristian indexingstrategiesforrapidsearchesofshortwordsingenomesequences AT ambrosinigiovanna indexingstrategiesforrapidsearchesofshortwordsingenomesequences AT bucherphilipp indexingstrategiesforrapidsearchesofshortwordsingenomesequences AT jongeneelcvictor indexingstrategiesforrapidsearchesofshortwordsingenomesequences |