Cargando…

MICA: desktop software for comprehensive searching of DNA databases

BACKGROUND: Molecular biologists work with DNA databases that often include entire genomes. A common requirement is to search a DNA database to find exact matches for a nondegenerate or partially degenerate query. The software programs available for such purposes are normally designed to run on remo...

Descripción completa

Detalles Bibliográficos
Autores principales: Stokes, William A, Glick, Benjamin S
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1618866/
https://www.ncbi.nlm.nih.gov/pubmed/17018144
http://dx.doi.org/10.1186/1471-2105-7-427
_version_ 1782130536452456448
author Stokes, William A
Glick, Benjamin S
author_facet Stokes, William A
Glick, Benjamin S
author_sort Stokes, William A
collection PubMed
description BACKGROUND: Molecular biologists work with DNA databases that often include entire genomes. A common requirement is to search a DNA database to find exact matches for a nondegenerate or partially degenerate query. The software programs available for such purposes are normally designed to run on remote servers, but an appealing alternative is to work with DNA databases stored on local computers. We describe a desktop software program termed MICA (K-Mer Indexing with Compact Arrays) that allows large DNA databases to be searched efficiently using very little memory. RESULTS: MICA rapidly indexes a DNA database. On a Macintosh G5 computer, the complete human genome could be indexed in about 5 minutes. The indexing algorithm recognizes all 15 characters of the DNA alphabet and fully captures the information in any DNA sequence, yet for a typical sequence of length L, the index occupies only about 2L bytes. The index can be searched to return a complete list of exact matches for a nondegenerate or partially degenerate query of any length. A typical search of a long DNA sequence involves reading only a small fraction of the index into memory. As a result, searches are fast even when the available RAM is limited. CONCLUSION: MICA is suitable as a search engine for desktop DNA analysis software.
format Text
id pubmed-1618866
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-16188662006-10-21 MICA: desktop software for comprehensive searching of DNA databases Stokes, William A Glick, Benjamin S BMC Bioinformatics Software BACKGROUND: Molecular biologists work with DNA databases that often include entire genomes. A common requirement is to search a DNA database to find exact matches for a nondegenerate or partially degenerate query. The software programs available for such purposes are normally designed to run on remote servers, but an appealing alternative is to work with DNA databases stored on local computers. We describe a desktop software program termed MICA (K-Mer Indexing with Compact Arrays) that allows large DNA databases to be searched efficiently using very little memory. RESULTS: MICA rapidly indexes a DNA database. On a Macintosh G5 computer, the complete human genome could be indexed in about 5 minutes. The indexing algorithm recognizes all 15 characters of the DNA alphabet and fully captures the information in any DNA sequence, yet for a typical sequence of length L, the index occupies only about 2L bytes. The index can be searched to return a complete list of exact matches for a nondegenerate or partially degenerate query of any length. A typical search of a long DNA sequence involves reading only a small fraction of the index into memory. As a result, searches are fast even when the available RAM is limited. CONCLUSION: MICA is suitable as a search engine for desktop DNA analysis software. BioMed Central 2006-10-03 /pmc/articles/PMC1618866/ /pubmed/17018144 http://dx.doi.org/10.1186/1471-2105-7-427 Text en Copyright © 2006 Stokes and Glick; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Stokes, William A
Glick, Benjamin S
MICA: desktop software for comprehensive searching of DNA databases
title MICA: desktop software for comprehensive searching of DNA databases
title_full MICA: desktop software for comprehensive searching of DNA databases
title_fullStr MICA: desktop software for comprehensive searching of DNA databases
title_full_unstemmed MICA: desktop software for comprehensive searching of DNA databases
title_short MICA: desktop software for comprehensive searching of DNA databases
title_sort mica: desktop software for comprehensive searching of dna databases
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1618866/
https://www.ncbi.nlm.nih.gov/pubmed/17018144
http://dx.doi.org/10.1186/1471-2105-7-427
work_keys_str_mv AT stokeswilliama micadesktopsoftwareforcomprehensivesearchingofdnadatabases
AT glickbenjamins micadesktopsoftwareforcomprehensivesearchingofdnadatabases