Cargando…
MICA: desktop software for comprehensive searching of DNA databases
BACKGROUND: Molecular biologists work with DNA databases that often include entire genomes. A common requirement is to search a DNA database to find exact matches for a nondegenerate or partially degenerate query. The software programs available for such purposes are normally designed to run on remo...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1618866/ https://www.ncbi.nlm.nih.gov/pubmed/17018144 http://dx.doi.org/10.1186/1471-2105-7-427 |
_version_ | 1782130536452456448 |
---|---|
author | Stokes, William A Glick, Benjamin S |
author_facet | Stokes, William A Glick, Benjamin S |
author_sort | Stokes, William A |
collection | PubMed |
description | BACKGROUND: Molecular biologists work with DNA databases that often include entire genomes. A common requirement is to search a DNA database to find exact matches for a nondegenerate or partially degenerate query. The software programs available for such purposes are normally designed to run on remote servers, but an appealing alternative is to work with DNA databases stored on local computers. We describe a desktop software program termed MICA (K-Mer Indexing with Compact Arrays) that allows large DNA databases to be searched efficiently using very little memory. RESULTS: MICA rapidly indexes a DNA database. On a Macintosh G5 computer, the complete human genome could be indexed in about 5 minutes. The indexing algorithm recognizes all 15 characters of the DNA alphabet and fully captures the information in any DNA sequence, yet for a typical sequence of length L, the index occupies only about 2L bytes. The index can be searched to return a complete list of exact matches for a nondegenerate or partially degenerate query of any length. A typical search of a long DNA sequence involves reading only a small fraction of the index into memory. As a result, searches are fast even when the available RAM is limited. CONCLUSION: MICA is suitable as a search engine for desktop DNA analysis software. |
format | Text |
id | pubmed-1618866 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-16188662006-10-21 MICA: desktop software for comprehensive searching of DNA databases Stokes, William A Glick, Benjamin S BMC Bioinformatics Software BACKGROUND: Molecular biologists work with DNA databases that often include entire genomes. A common requirement is to search a DNA database to find exact matches for a nondegenerate or partially degenerate query. The software programs available for such purposes are normally designed to run on remote servers, but an appealing alternative is to work with DNA databases stored on local computers. We describe a desktop software program termed MICA (K-Mer Indexing with Compact Arrays) that allows large DNA databases to be searched efficiently using very little memory. RESULTS: MICA rapidly indexes a DNA database. On a Macintosh G5 computer, the complete human genome could be indexed in about 5 minutes. The indexing algorithm recognizes all 15 characters of the DNA alphabet and fully captures the information in any DNA sequence, yet for a typical sequence of length L, the index occupies only about 2L bytes. The index can be searched to return a complete list of exact matches for a nondegenerate or partially degenerate query of any length. A typical search of a long DNA sequence involves reading only a small fraction of the index into memory. As a result, searches are fast even when the available RAM is limited. CONCLUSION: MICA is suitable as a search engine for desktop DNA analysis software. BioMed Central 2006-10-03 /pmc/articles/PMC1618866/ /pubmed/17018144 http://dx.doi.org/10.1186/1471-2105-7-427 Text en Copyright © 2006 Stokes and Glick; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Stokes, William A Glick, Benjamin S MICA: desktop software for comprehensive searching of DNA databases |
title | MICA: desktop software for comprehensive searching of DNA databases |
title_full | MICA: desktop software for comprehensive searching of DNA databases |
title_fullStr | MICA: desktop software for comprehensive searching of DNA databases |
title_full_unstemmed | MICA: desktop software for comprehensive searching of DNA databases |
title_short | MICA: desktop software for comprehensive searching of DNA databases |
title_sort | mica: desktop software for comprehensive searching of dna databases |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1618866/ https://www.ncbi.nlm.nih.gov/pubmed/17018144 http://dx.doi.org/10.1186/1471-2105-7-427 |
work_keys_str_mv | AT stokeswilliama micadesktopsoftwareforcomprehensivesearchingofdnadatabases AT glickbenjamins micadesktopsoftwareforcomprehensivesearchingofdnadatabases |