Cargando…

Metagenomic classification with KrakenUniq on low-memory computers

Kraken and KrakenUniq are widely-used tools for classifying metagenomics sequences. A key requirement for these systems is a database containing all k-mers from all genomes that the users want to be able to detect, where k = 31 by default. This database can be very large, easily exceeding 100 gigaby...

Descripción completa

Detalles Bibliográficos
Autores principales: Pockrandt, Christopher, Zimin, Aleksey V., Salzberg, Steven L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10438097/
https://www.ncbi.nlm.nih.gov/pubmed/37602140
http://dx.doi.org/10.21105/joss.04908
_version_ 1785092712015331328
author Pockrandt, Christopher
Zimin, Aleksey V.
Salzberg, Steven L.
author_facet Pockrandt, Christopher
Zimin, Aleksey V.
Salzberg, Steven L.
author_sort Pockrandt, Christopher
collection PubMed
description Kraken and KrakenUniq are widely-used tools for classifying metagenomics sequences. A key requirement for these systems is a database containing all k-mers from all genomes that the users want to be able to detect, where k = 31 by default. This database can be very large, easily exceeding 100 gigabytes (GB) and sometimes 400 GB. Previously, Kraken and KrakenUniq required loading the entire database into main memory (RAM), and if RAM was insufficient, they used memory mapping, which significantly increased the running time for large datasets. We have implemented a new algorithm in KrakenUniq that allows it to load and process the database in chunks, with only a modest increase in running time. This enhancement now makes it feasible to run KrakenUniq on very large datasets and huge databases on virtually any computer, even a laptop, while providing the same very high classification accuracy as the previous system. STATEMENT OF NEED: The KrakenUniq software classifies reads from metagenomic samples to establish which organisms are present in the samples and estimate their abundance. The software is widely used used by researchers and clinicians in medical diagnostics, microbiome and environmental studies. Typical databases used by KrakenUniq are tens to hundreds of gigabytes in size. The original KrakenUniq code required loading the entire database in RAM, which demanded expensive high-memory servers to run it efficiently. If a user did not have enough physical RAM to load the entire database, KrakenUniq resorted to memory-mapping the database, which significantly increased run times, frequently by a factor of more than 100. The new functionality described in this paper enables users who do not have access to high-memory servers to run KrakenUniq efficiently, with a CPU time performance increase of 3 to 4-fold, down from 100+.
format Online
Article
Text
id pubmed-10438097
institution National Center for Biotechnology Information
language English
publishDate 2022
record_format MEDLINE/PubMed
spelling pubmed-104380972023-08-18 Metagenomic classification with KrakenUniq on low-memory computers Pockrandt, Christopher Zimin, Aleksey V. Salzberg, Steven L. J Open Source Softw Article Kraken and KrakenUniq are widely-used tools for classifying metagenomics sequences. A key requirement for these systems is a database containing all k-mers from all genomes that the users want to be able to detect, where k = 31 by default. This database can be very large, easily exceeding 100 gigabytes (GB) and sometimes 400 GB. Previously, Kraken and KrakenUniq required loading the entire database into main memory (RAM), and if RAM was insufficient, they used memory mapping, which significantly increased the running time for large datasets. We have implemented a new algorithm in KrakenUniq that allows it to load and process the database in chunks, with only a modest increase in running time. This enhancement now makes it feasible to run KrakenUniq on very large datasets and huge databases on virtually any computer, even a laptop, while providing the same very high classification accuracy as the previous system. STATEMENT OF NEED: The KrakenUniq software classifies reads from metagenomic samples to establish which organisms are present in the samples and estimate their abundance. The software is widely used used by researchers and clinicians in medical diagnostics, microbiome and environmental studies. Typical databases used by KrakenUniq are tens to hundreds of gigabytes in size. The original KrakenUniq code required loading the entire database in RAM, which demanded expensive high-memory servers to run it efficiently. If a user did not have enough physical RAM to load the entire database, KrakenUniq resorted to memory-mapping the database, which significantly increased run times, frequently by a factor of more than 100. The new functionality described in this paper enables users who do not have access to high-memory servers to run KrakenUniq efficiently, with a CPU time performance increase of 3 to 4-fold, down from 100+. 2022 2022-12-28 /pmc/articles/PMC10438097/ /pubmed/37602140 http://dx.doi.org/10.21105/joss.04908 Text en https://creativecommons.org/licenses/by/4.0/Authors of papers retain copyright and release the work under a Creative Commons Attribution 4.0 International License (CCBY4.0 (https://creativecommons.org/licenses/by/4.0/) ).
spellingShingle Article
Pockrandt, Christopher
Zimin, Aleksey V.
Salzberg, Steven L.
Metagenomic classification with KrakenUniq on low-memory computers
title Metagenomic classification with KrakenUniq on low-memory computers
title_full Metagenomic classification with KrakenUniq on low-memory computers
title_fullStr Metagenomic classification with KrakenUniq on low-memory computers
title_full_unstemmed Metagenomic classification with KrakenUniq on low-memory computers
title_short Metagenomic classification with KrakenUniq on low-memory computers
title_sort metagenomic classification with krakenuniq on low-memory computers
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10438097/
https://www.ncbi.nlm.nih.gov/pubmed/37602140
http://dx.doi.org/10.21105/joss.04908
work_keys_str_mv AT pockrandtchristopher metagenomicclassificationwithkrakenuniqonlowmemorycomputers
AT ziminalekseyv metagenomicclassificationwithkrakenuniqonlowmemorycomputers
AT salzbergstevenl metagenomicclassificationwithkrakenuniqonlowmemorycomputers