Cargando…

The K-mer File Format: a standardized and compact disk representation of sets of k-mers

SUMMARY: Bioinformatics applications increasingly rely on ad hoc disk storage of k-mer sets, e.g. for de Bruijn graphs or alignment indexes. Here, we introduce the K-mer File Format as a general lossless framework for storing and manipulating k-mer sets, realizing space savings of 3–5× compared to o...

Descripción completa

Detalles Bibliográficos
Autores principales: Dufresne, Yoann, Lemane, Teo, Marijon, Pierre, Peterlongo, Pierre, Rahman, Amatur, Kokot, Marek, Medvedev, Paul, Deorowicz, Sebastian, Chikhi, Rayan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9477520/
https://www.ncbi.nlm.nih.gov/pubmed/35904548
http://dx.doi.org/10.1093/bioinformatics/btac528
_version_ 1784790379291213824
author Dufresne, Yoann
Lemane, Teo
Marijon, Pierre
Peterlongo, Pierre
Rahman, Amatur
Kokot, Marek
Medvedev, Paul
Deorowicz, Sebastian
Chikhi, Rayan
author_facet Dufresne, Yoann
Lemane, Teo
Marijon, Pierre
Peterlongo, Pierre
Rahman, Amatur
Kokot, Marek
Medvedev, Paul
Deorowicz, Sebastian
Chikhi, Rayan
author_sort Dufresne, Yoann
collection PubMed
description SUMMARY: Bioinformatics applications increasingly rely on ad hoc disk storage of k-mer sets, e.g. for de Bruijn graphs or alignment indexes. Here, we introduce the K-mer File Format as a general lossless framework for storing and manipulating k-mer sets, realizing space savings of 3–5× compared to other formats, and bringing interoperability across tools. AVAILABILITY AND IMPLEMENTATION: Format specification, C++/Rust API, tools: https://github.com/Kmer-File-Format/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9477520
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-94775202022-09-19 The K-mer File Format: a standardized and compact disk representation of sets of k-mers Dufresne, Yoann Lemane, Teo Marijon, Pierre Peterlongo, Pierre Rahman, Amatur Kokot, Marek Medvedev, Paul Deorowicz, Sebastian Chikhi, Rayan Bioinformatics Applications Note SUMMARY: Bioinformatics applications increasingly rely on ad hoc disk storage of k-mer sets, e.g. for de Bruijn graphs or alignment indexes. Here, we introduce the K-mer File Format as a general lossless framework for storing and manipulating k-mer sets, realizing space savings of 3–5× compared to other formats, and bringing interoperability across tools. AVAILABILITY AND IMPLEMENTATION: Format specification, C++/Rust API, tools: https://github.com/Kmer-File-Format/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-07-29 /pmc/articles/PMC9477520/ /pubmed/35904548 http://dx.doi.org/10.1093/bioinformatics/btac528 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Note
Dufresne, Yoann
Lemane, Teo
Marijon, Pierre
Peterlongo, Pierre
Rahman, Amatur
Kokot, Marek
Medvedev, Paul
Deorowicz, Sebastian
Chikhi, Rayan
The K-mer File Format: a standardized and compact disk representation of sets of k-mers
title The K-mer File Format: a standardized and compact disk representation of sets of k-mers
title_full The K-mer File Format: a standardized and compact disk representation of sets of k-mers
title_fullStr The K-mer File Format: a standardized and compact disk representation of sets of k-mers
title_full_unstemmed The K-mer File Format: a standardized and compact disk representation of sets of k-mers
title_short The K-mer File Format: a standardized and compact disk representation of sets of k-mers
title_sort k-mer file format: a standardized and compact disk representation of sets of k-mers
topic Applications Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9477520/
https://www.ncbi.nlm.nih.gov/pubmed/35904548
http://dx.doi.org/10.1093/bioinformatics/btac528
work_keys_str_mv AT dufresneyoann thekmerfileformatastandardizedandcompactdiskrepresentationofsetsofkmers
AT lemaneteo thekmerfileformatastandardizedandcompactdiskrepresentationofsetsofkmers
AT marijonpierre thekmerfileformatastandardizedandcompactdiskrepresentationofsetsofkmers
AT peterlongopierre thekmerfileformatastandardizedandcompactdiskrepresentationofsetsofkmers
AT rahmanamatur thekmerfileformatastandardizedandcompactdiskrepresentationofsetsofkmers
AT kokotmarek thekmerfileformatastandardizedandcompactdiskrepresentationofsetsofkmers
AT medvedevpaul thekmerfileformatastandardizedandcompactdiskrepresentationofsetsofkmers
AT deorowiczsebastian thekmerfileformatastandardizedandcompactdiskrepresentationofsetsofkmers
AT chikhirayan thekmerfileformatastandardizedandcompactdiskrepresentationofsetsofkmers
AT dufresneyoann kmerfileformatastandardizedandcompactdiskrepresentationofsetsofkmers
AT lemaneteo kmerfileformatastandardizedandcompactdiskrepresentationofsetsofkmers
AT marijonpierre kmerfileformatastandardizedandcompactdiskrepresentationofsetsofkmers
AT peterlongopierre kmerfileformatastandardizedandcompactdiskrepresentationofsetsofkmers
AT rahmanamatur kmerfileformatastandardizedandcompactdiskrepresentationofsetsofkmers
AT kokotmarek kmerfileformatastandardizedandcompactdiskrepresentationofsetsofkmers
AT medvedevpaul kmerfileformatastandardizedandcompactdiskrepresentationofsetsofkmers
AT deorowiczsebastian kmerfileformatastandardizedandcompactdiskrepresentationofsetsofkmers
AT chikhirayan kmerfileformatastandardizedandcompactdiskrepresentationofsetsofkmers