Cargando…

Metagenomic classification with KrakenUniq on low-memory computers

Kraken and KrakenUniq are widely-used tools for classifying metagenomics sequences. A key requirement for these systems is a database containing all k-mers from all genomes that the users want to be able to detect, where k = 31 by default. This database can be very large, easily exceeding 100 gigaby...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pockrandt, Christopher, Zimin, Aleksey V., Salzberg, Steven L.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10438097/ https://www.ncbi.nlm.nih.gov/pubmed/37602140 http://dx.doi.org/10.21105/joss.04908

_version_	1785092712015331328
author	Pockrandt, Christopher Zimin, Aleksey V. Salzberg, Steven L.
author_facet	Pockrandt, Christopher Zimin, Aleksey V. Salzberg, Steven L.
author_sort	Pockrandt, Christopher
collection	PubMed
description	Kraken and KrakenUniq are widely-used tools for classifying metagenomics sequences. A key requirement for these systems is a database containing all k-mers from all genomes that the users want to be able to detect, where k = 31 by default. This database can be very large, easily exceeding 100 gigabytes (GB) and sometimes 400 GB. Previously, Kraken and KrakenUniq required loading the entire database into main memory (RAM), and if RAM was insufficient, they used memory mapping, which significantly increased the running time for large datasets. We have implemented a new algorithm in KrakenUniq that allows it to load and process the database in chunks, with only a modest increase in running time. This enhancement now makes it feasible to run KrakenUniq on very large datasets and huge databases on virtually any computer, even a laptop, while providing the same very high classification accuracy as the previous system. STATEMENT OF NEED: The KrakenUniq software classifies reads from metagenomic samples to establish which organisms are present in the samples and estimate their abundance. The software is widely used used by researchers and clinicians in medical diagnostics, microbiome and environmental studies. Typical databases used by KrakenUniq are tens to hundreds of gigabytes in size. The original KrakenUniq code required loading the entire database in RAM, which demanded expensive high-memory servers to run it efficiently. If a user did not have enough physical RAM to load the entire database, KrakenUniq resorted to memory-mapping the database, which significantly increased run times, frequently by a factor of more than 100. The new functionality described in this paper enables users who do not have access to high-memory servers to run KrakenUniq efficiently, with a CPU time performance increase of 3 to 4-fold, down from 100+.
format	Online Article Text
id	pubmed-10438097
institution	National Center for Biotechnology Information
language	English
publishDate	2022
record_format	MEDLINE/PubMed
spelling	pubmed-104380972023-08-18 Metagenomic classification with KrakenUniq on low-memory computers Pockrandt, Christopher Zimin, Aleksey V. Salzberg, Steven L. J Open Source Softw Article Kraken and KrakenUniq are widely-used tools for classifying metagenomics sequences. A key requirement for these systems is a database containing all k-mers from all genomes that the users want to be able to detect, where k = 31 by default. This database can be very large, easily exceeding 100 gigabytes (GB) and sometimes 400 GB. Previously, Kraken and KrakenUniq required loading the entire database into main memory (RAM), and if RAM was insufficient, they used memory mapping, which significantly increased the running time for large datasets. We have implemented a new algorithm in KrakenUniq that allows it to load and process the database in chunks, with only a modest increase in running time. This enhancement now makes it feasible to run KrakenUniq on very large datasets and huge databases on virtually any computer, even a laptop, while providing the same very high classification accuracy as the previous system. STATEMENT OF NEED: The KrakenUniq software classifies reads from metagenomic samples to establish which organisms are present in the samples and estimate their abundance. The software is widely used used by researchers and clinicians in medical diagnostics, microbiome and environmental studies. Typical databases used by KrakenUniq are tens to hundreds of gigabytes in size. The original KrakenUniq code required loading the entire database in RAM, which demanded expensive high-memory servers to run it efficiently. If a user did not have enough physical RAM to load the entire database, KrakenUniq resorted to memory-mapping the database, which significantly increased run times, frequently by a factor of more than 100. The new functionality described in this paper enables users who do not have access to high-memory servers to run KrakenUniq efficiently, with a CPU time performance increase of 3 to 4-fold, down from 100+. 2022 2022-12-28 /pmc/articles/PMC10438097/ /pubmed/37602140 http://dx.doi.org/10.21105/joss.04908 Text en https://creativecommons.org/licenses/by/4.0/Authors of papers retain copyright and release the work under a Creative Commons Attribution 4.0 International License (CCBY4.0 (https://creativecommons.org/licenses/by/4.0/) ).
spellingShingle	Article Pockrandt, Christopher Zimin, Aleksey V. Salzberg, Steven L. Metagenomic classification with KrakenUniq on low-memory computers
title	Metagenomic classification with KrakenUniq on low-memory computers
title_full	Metagenomic classification with KrakenUniq on low-memory computers
title_fullStr	Metagenomic classification with KrakenUniq on low-memory computers
title_full_unstemmed	Metagenomic classification with KrakenUniq on low-memory computers
title_short	Metagenomic classification with KrakenUniq on low-memory computers
title_sort	metagenomic classification with krakenuniq on low-memory computers
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10438097/ https://www.ncbi.nlm.nih.gov/pubmed/37602140 http://dx.doi.org/10.21105/joss.04908
work_keys_str_mv	AT pockrandtchristopher metagenomicclassificationwithkrakenuniqonlowmemorycomputers AT ziminalekseyv metagenomicclassificationwithkrakenuniqonlowmemorycomputers AT salzbergstevenl metagenomicclassificationwithkrakenuniqonlowmemorycomputers

Metagenomic classification with KrakenUniq on low-memory computers

Ejemplares similares