Cargando…

Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach

The amount of data produced by genome sequencing experiments has been growing rapidly over the past several years, making compression important for efficient storage, transfer and analysis of the data. In recent years, nanopore sequencing technologies have seen increasing adoption since they are por...

Descripción completa

Detalles Bibliográficos
Autores principales: Meng, Qingxi, Chandak, Shubham, Zhu, Yifan, Weissman, Tsachy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9902536/
https://www.ncbi.nlm.nih.gov/pubmed/36747011
http://dx.doi.org/10.1038/s41598-023-29267-8
_version_ 1784883284195409920
author Meng, Qingxi
Chandak, Shubham
Zhu, Yifan
Weissman, Tsachy
author_facet Meng, Qingxi
Chandak, Shubham
Zhu, Yifan
Weissman, Tsachy
author_sort Meng, Qingxi
collection PubMed
description The amount of data produced by genome sequencing experiments has been growing rapidly over the past several years, making compression important for efficient storage, transfer and analysis of the data. In recent years, nanopore sequencing technologies have seen increasing adoption since they are portable, real-time and provide long reads. However, there has been limited progress on compression of nanopore sequencing reads obtained in FASTQ files since most existing tools are either general-purpose or specialized for short read data. We present NanoSpring, a reference-free compressor for nanopore sequencing reads, relying on an approximate assembly approach. We evaluate NanoSpring on a variety of datasets including bacterial, metagenomic, plant, animal, and human whole genome data. For recently basecalled high quality nanopore datasets, NanoSpring, which focuses only on the base sequences in the FASTQ file, uses just 0.35–0.65 bits per base which is 3–6[Formula: see text] lower than general purpose compressors like gzip. NanoSpring is competitive in compression ratio and compression resource usage with the state-of-the-art tool CoLoRd while being significantly faster at decompression when using multiple threads (> 4[Formula: see text] faster decompression with 20 threads). NanoSpring is available on GitHub at https://github.com/qm2/NanoSpring.
format Online
Article
Text
id pubmed-9902536
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-99025362023-02-08 Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach Meng, Qingxi Chandak, Shubham Zhu, Yifan Weissman, Tsachy Sci Rep Article The amount of data produced by genome sequencing experiments has been growing rapidly over the past several years, making compression important for efficient storage, transfer and analysis of the data. In recent years, nanopore sequencing technologies have seen increasing adoption since they are portable, real-time and provide long reads. However, there has been limited progress on compression of nanopore sequencing reads obtained in FASTQ files since most existing tools are either general-purpose or specialized for short read data. We present NanoSpring, a reference-free compressor for nanopore sequencing reads, relying on an approximate assembly approach. We evaluate NanoSpring on a variety of datasets including bacterial, metagenomic, plant, animal, and human whole genome data. For recently basecalled high quality nanopore datasets, NanoSpring, which focuses only on the base sequences in the FASTQ file, uses just 0.35–0.65 bits per base which is 3–6[Formula: see text] lower than general purpose compressors like gzip. NanoSpring is competitive in compression ratio and compression resource usage with the state-of-the-art tool CoLoRd while being significantly faster at decompression when using multiple threads (> 4[Formula: see text] faster decompression with 20 threads). NanoSpring is available on GitHub at https://github.com/qm2/NanoSpring. Nature Publishing Group UK 2023-02-06 /pmc/articles/PMC9902536/ /pubmed/36747011 http://dx.doi.org/10.1038/s41598-023-29267-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Meng, Qingxi
Chandak, Shubham
Zhu, Yifan
Weissman, Tsachy
Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach
title Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach
title_full Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach
title_fullStr Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach
title_full_unstemmed Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach
title_short Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach
title_sort reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9902536/
https://www.ncbi.nlm.nih.gov/pubmed/36747011
http://dx.doi.org/10.1038/s41598-023-29267-8
work_keys_str_mv AT mengqingxi referencefreelosslesscompressionofnanoporesequencingreadsusinganapproximateassemblyapproach
AT chandakshubham referencefreelosslesscompressionofnanoporesequencingreadsusinganapproximateassemblyapproach
AT zhuyifan referencefreelosslesscompressionofnanoporesequencingreadsusinganapproximateassemblyapproach
AT weissmantsachy referencefreelosslesscompressionofnanoporesequencingreadsusinganapproximateassemblyapproach