Cargando…
Metagenomic classification with KrakenUniq on low-memory computers
Kraken and KrakenUniq are widely-used tools for classifying metagenomics sequences. A key requirement for these systems is a database containing all k-mers from all genomes that the users want to be able to detect, where k = 31 by default. This database can be very large, easily exceeding 100 gigaby...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10438097/ https://www.ncbi.nlm.nih.gov/pubmed/37602140 http://dx.doi.org/10.21105/joss.04908 |
_version_ | 1785092712015331328 |
---|---|
author | Pockrandt, Christopher Zimin, Aleksey V. Salzberg, Steven L. |
author_facet | Pockrandt, Christopher Zimin, Aleksey V. Salzberg, Steven L. |
author_sort | Pockrandt, Christopher |
collection | PubMed |
description | Kraken and KrakenUniq are widely-used tools for classifying metagenomics sequences. A key requirement for these systems is a database containing all k-mers from all genomes that the users want to be able to detect, where k = 31 by default. This database can be very large, easily exceeding 100 gigabytes (GB) and sometimes 400 GB. Previously, Kraken and KrakenUniq required loading the entire database into main memory (RAM), and if RAM was insufficient, they used memory mapping, which significantly increased the running time for large datasets. We have implemented a new algorithm in KrakenUniq that allows it to load and process the database in chunks, with only a modest increase in running time. This enhancement now makes it feasible to run KrakenUniq on very large datasets and huge databases on virtually any computer, even a laptop, while providing the same very high classification accuracy as the previous system. STATEMENT OF NEED: The KrakenUniq software classifies reads from metagenomic samples to establish which organisms are present in the samples and estimate their abundance. The software is widely used used by researchers and clinicians in medical diagnostics, microbiome and environmental studies. Typical databases used by KrakenUniq are tens to hundreds of gigabytes in size. The original KrakenUniq code required loading the entire database in RAM, which demanded expensive high-memory servers to run it efficiently. If a user did not have enough physical RAM to load the entire database, KrakenUniq resorted to memory-mapping the database, which significantly increased run times, frequently by a factor of more than 100. The new functionality described in this paper enables users who do not have access to high-memory servers to run KrakenUniq efficiently, with a CPU time performance increase of 3 to 4-fold, down from 100+. |
format | Online Article Text |
id | pubmed-10438097 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
record_format | MEDLINE/PubMed |
spelling | pubmed-104380972023-08-18 Metagenomic classification with KrakenUniq on low-memory computers Pockrandt, Christopher Zimin, Aleksey V. Salzberg, Steven L. J Open Source Softw Article Kraken and KrakenUniq are widely-used tools for classifying metagenomics sequences. A key requirement for these systems is a database containing all k-mers from all genomes that the users want to be able to detect, where k = 31 by default. This database can be very large, easily exceeding 100 gigabytes (GB) and sometimes 400 GB. Previously, Kraken and KrakenUniq required loading the entire database into main memory (RAM), and if RAM was insufficient, they used memory mapping, which significantly increased the running time for large datasets. We have implemented a new algorithm in KrakenUniq that allows it to load and process the database in chunks, with only a modest increase in running time. This enhancement now makes it feasible to run KrakenUniq on very large datasets and huge databases on virtually any computer, even a laptop, while providing the same very high classification accuracy as the previous system. STATEMENT OF NEED: The KrakenUniq software classifies reads from metagenomic samples to establish which organisms are present in the samples and estimate their abundance. The software is widely used used by researchers and clinicians in medical diagnostics, microbiome and environmental studies. Typical databases used by KrakenUniq are tens to hundreds of gigabytes in size. The original KrakenUniq code required loading the entire database in RAM, which demanded expensive high-memory servers to run it efficiently. If a user did not have enough physical RAM to load the entire database, KrakenUniq resorted to memory-mapping the database, which significantly increased run times, frequently by a factor of more than 100. The new functionality described in this paper enables users who do not have access to high-memory servers to run KrakenUniq efficiently, with a CPU time performance increase of 3 to 4-fold, down from 100+. 2022 2022-12-28 /pmc/articles/PMC10438097/ /pubmed/37602140 http://dx.doi.org/10.21105/joss.04908 Text en https://creativecommons.org/licenses/by/4.0/Authors of papers retain copyright and release the work under a Creative Commons Attribution 4.0 International License (CCBY4.0 (https://creativecommons.org/licenses/by/4.0/) ). |
spellingShingle | Article Pockrandt, Christopher Zimin, Aleksey V. Salzberg, Steven L. Metagenomic classification with KrakenUniq on low-memory computers |
title | Metagenomic classification with KrakenUniq on low-memory computers |
title_full | Metagenomic classification with KrakenUniq on low-memory computers |
title_fullStr | Metagenomic classification with KrakenUniq on low-memory computers |
title_full_unstemmed | Metagenomic classification with KrakenUniq on low-memory computers |
title_short | Metagenomic classification with KrakenUniq on low-memory computers |
title_sort | metagenomic classification with krakenuniq on low-memory computers |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10438097/ https://www.ncbi.nlm.nih.gov/pubmed/37602140 http://dx.doi.org/10.21105/joss.04908 |
work_keys_str_mv | AT pockrandtchristopher metagenomicclassificationwithkrakenuniqonlowmemorycomputers AT ziminalekseyv metagenomicclassificationwithkrakenuniqonlowmemorycomputers AT salzbergstevenl metagenomicclassificationwithkrakenuniqonlowmemorycomputers |