Cargando…

Indexes of Large Genome Collections on a PC

The availability of thousands of individual genomes of one species should boost rapid progress in personalized medicine or understanding of the interaction between genotype and phenotype, to name a few applications. A key operation useful in such analyses is aligning sequencing reads against a colle...

Descripción completa

Detalles Bibliográficos
Autores principales: Danek, Agnieszka, Deorowicz, Sebastian, Grabowski, Szymon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4188820/
https://www.ncbi.nlm.nih.gov/pubmed/25289699
http://dx.doi.org/10.1371/journal.pone.0109384
_version_ 1782338274620080128
author Danek, Agnieszka
Deorowicz, Sebastian
Grabowski, Szymon
author_facet Danek, Agnieszka
Deorowicz, Sebastian
Grabowski, Szymon
author_sort Danek, Agnieszka
collection PubMed
description The availability of thousands of individual genomes of one species should boost rapid progress in personalized medicine or understanding of the interaction between genotype and phenotype, to name a few applications. A key operation useful in such analyses is aligning sequencing reads against a collection of genomes, which is costly with the use of existing algorithms due to their large memory requirements. We present MuGI, Multiple Genome Index, which reports all occurrences of a given pattern, in exact and approximate matching model, against a collection of thousand(s) genomes. Its unique feature is the small index size, which is customisable. It fits in a standard computer with 16–32 GB, or even 8 GB, of RAM, for the 1000GP collection of 1092 diploid human genomes. The solution is also fast. For example, the exact matching queries (of average length 150 bp) are handled in average time of 39 µs and with up to 3 mismatches in 373 µs on the test PC with the index size of 13.4 GB. For a smaller index, occupying 7.4 GB in memory, the respective times grow to 76 µs and 917 µs. Software is available at http://sun.aei.polsl.pl/mugi under a free license. Data S1 is available at PLOS One online.
format Online
Article
Text
id pubmed-4188820
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-41888202014-10-10 Indexes of Large Genome Collections on a PC Danek, Agnieszka Deorowicz, Sebastian Grabowski, Szymon PLoS One Research Article The availability of thousands of individual genomes of one species should boost rapid progress in personalized medicine or understanding of the interaction between genotype and phenotype, to name a few applications. A key operation useful in such analyses is aligning sequencing reads against a collection of genomes, which is costly with the use of existing algorithms due to their large memory requirements. We present MuGI, Multiple Genome Index, which reports all occurrences of a given pattern, in exact and approximate matching model, against a collection of thousand(s) genomes. Its unique feature is the small index size, which is customisable. It fits in a standard computer with 16–32 GB, or even 8 GB, of RAM, for the 1000GP collection of 1092 diploid human genomes. The solution is also fast. For example, the exact matching queries (of average length 150 bp) are handled in average time of 39 µs and with up to 3 mismatches in 373 µs on the test PC with the index size of 13.4 GB. For a smaller index, occupying 7.4 GB in memory, the respective times grow to 76 µs and 917 µs. Software is available at http://sun.aei.polsl.pl/mugi under a free license. Data S1 is available at PLOS One online. Public Library of Science 2014-10-07 /pmc/articles/PMC4188820/ /pubmed/25289699 http://dx.doi.org/10.1371/journal.pone.0109384 Text en © 2014 Danek et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Danek, Agnieszka
Deorowicz, Sebastian
Grabowski, Szymon
Indexes of Large Genome Collections on a PC
title Indexes of Large Genome Collections on a PC
title_full Indexes of Large Genome Collections on a PC
title_fullStr Indexes of Large Genome Collections on a PC
title_full_unstemmed Indexes of Large Genome Collections on a PC
title_short Indexes of Large Genome Collections on a PC
title_sort indexes of large genome collections on a pc
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4188820/
https://www.ncbi.nlm.nih.gov/pubmed/25289699
http://dx.doi.org/10.1371/journal.pone.0109384
work_keys_str_mv AT danekagnieszka indexesoflargegenomecollectionsonapc
AT deorowiczsebastian indexesoflargegenomecollectionsonapc
AT grabowskiszymon indexesoflargegenomecollectionsonapc