Cargando…
Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach
The amount of data produced by genome sequencing experiments has been growing rapidly over the past several years, making compression important for efficient storage, transfer and analysis of the data. In recent years, nanopore sequencing technologies have seen increasing adoption since they are por...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9902536/ https://www.ncbi.nlm.nih.gov/pubmed/36747011 http://dx.doi.org/10.1038/s41598-023-29267-8 |
_version_ | 1784883284195409920 |
---|---|
author | Meng, Qingxi Chandak, Shubham Zhu, Yifan Weissman, Tsachy |
author_facet | Meng, Qingxi Chandak, Shubham Zhu, Yifan Weissman, Tsachy |
author_sort | Meng, Qingxi |
collection | PubMed |
description | The amount of data produced by genome sequencing experiments has been growing rapidly over the past several years, making compression important for efficient storage, transfer and analysis of the data. In recent years, nanopore sequencing technologies have seen increasing adoption since they are portable, real-time and provide long reads. However, there has been limited progress on compression of nanopore sequencing reads obtained in FASTQ files since most existing tools are either general-purpose or specialized for short read data. We present NanoSpring, a reference-free compressor for nanopore sequencing reads, relying on an approximate assembly approach. We evaluate NanoSpring on a variety of datasets including bacterial, metagenomic, plant, animal, and human whole genome data. For recently basecalled high quality nanopore datasets, NanoSpring, which focuses only on the base sequences in the FASTQ file, uses just 0.35–0.65 bits per base which is 3–6[Formula: see text] lower than general purpose compressors like gzip. NanoSpring is competitive in compression ratio and compression resource usage with the state-of-the-art tool CoLoRd while being significantly faster at decompression when using multiple threads (> 4[Formula: see text] faster decompression with 20 threads). NanoSpring is available on GitHub at https://github.com/qm2/NanoSpring. |
format | Online Article Text |
id | pubmed-9902536 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-99025362023-02-08 Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach Meng, Qingxi Chandak, Shubham Zhu, Yifan Weissman, Tsachy Sci Rep Article The amount of data produced by genome sequencing experiments has been growing rapidly over the past several years, making compression important for efficient storage, transfer and analysis of the data. In recent years, nanopore sequencing technologies have seen increasing adoption since they are portable, real-time and provide long reads. However, there has been limited progress on compression of nanopore sequencing reads obtained in FASTQ files since most existing tools are either general-purpose or specialized for short read data. We present NanoSpring, a reference-free compressor for nanopore sequencing reads, relying on an approximate assembly approach. We evaluate NanoSpring on a variety of datasets including bacterial, metagenomic, plant, animal, and human whole genome data. For recently basecalled high quality nanopore datasets, NanoSpring, which focuses only on the base sequences in the FASTQ file, uses just 0.35–0.65 bits per base which is 3–6[Formula: see text] lower than general purpose compressors like gzip. NanoSpring is competitive in compression ratio and compression resource usage with the state-of-the-art tool CoLoRd while being significantly faster at decompression when using multiple threads (> 4[Formula: see text] faster decompression with 20 threads). NanoSpring is available on GitHub at https://github.com/qm2/NanoSpring. Nature Publishing Group UK 2023-02-06 /pmc/articles/PMC9902536/ /pubmed/36747011 http://dx.doi.org/10.1038/s41598-023-29267-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Meng, Qingxi Chandak, Shubham Zhu, Yifan Weissman, Tsachy Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach |
title | Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach |
title_full | Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach |
title_fullStr | Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach |
title_full_unstemmed | Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach |
title_short | Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach |
title_sort | reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9902536/ https://www.ncbi.nlm.nih.gov/pubmed/36747011 http://dx.doi.org/10.1038/s41598-023-29267-8 |
work_keys_str_mv | AT mengqingxi referencefreelosslesscompressionofnanoporesequencingreadsusinganapproximateassemblyapproach AT chandakshubham referencefreelosslesscompressionofnanoporesequencingreadsusinganapproximateassemblyapproach AT zhuyifan referencefreelosslesscompressionofnanoporesequencingreadsusinganapproximateassemblyapproach AT weissmantsachy referencefreelosslesscompressionofnanoporesequencingreadsusinganapproximateassemblyapproach |