Cargando…

GTZ: a fast compression and cloud transmission tool optimized for FASTQ files

BACKGROUND: The dramatic development of DNA sequencing technology is generating real big data, craving for more storage and bandwidth. To speed up data sharing and bring data to computing resource faster and cheaper, it is necessary to develop a compression tool than can support efficient compressio...

Descripción completa

Detalles Bibliográficos
Autores principales: Xing, Yuting, Li, Gen, Wang, Zhenguo, Feng, Bolun, Song, Zhuo, Wu, Chengkun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5751770/
https://www.ncbi.nlm.nih.gov/pubmed/29297296
http://dx.doi.org/10.1186/s12859-017-1973-5
_version_ 1783290014318198784
author Xing, Yuting
Li, Gen
Wang, Zhenguo
Feng, Bolun
Song, Zhuo
Wu, Chengkun
author_facet Xing, Yuting
Li, Gen
Wang, Zhenguo
Feng, Bolun
Song, Zhuo
Wu, Chengkun
author_sort Xing, Yuting
collection PubMed
description BACKGROUND: The dramatic development of DNA sequencing technology is generating real big data, craving for more storage and bandwidth. To speed up data sharing and bring data to computing resource faster and cheaper, it is necessary to develop a compression tool than can support efficient compression and transmission of sequencing data onto the cloud storage. RESULTS: This paper presents GTZ, a compression and transmission tool, optimized for FASTQ files. As a reference-free lossless FASTQ compressor, GTZ treats different lines of FASTQ separately, utilizes adaptive context modelling to estimate their characteristic probabilities, and compresses data blocks with arithmetic coding. GTZ can also be used to compress multiple files or directories at once. Furthermore, as a tool to be used in the cloud computing era, it is capable of saving compressed data locally or transmitting data directly into cloud by choice. We evaluated the performance of GTZ on some diverse FASTQ benchmarks. Results show that in most cases, it outperforms many other tools in terms of the compression ratio, speed and stability. CONCLUSIONS: GTZ is a tool that enables efficient lossless FASTQ data compression and simultaneous data transmission onto to cloud. It emerges as a useful tool for NGS data storage and transmission in the cloud environment. GTZ is freely available online at: https://github.com/Genetalks/gtz. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-017-1973-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5751770
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57517702018-01-05 GTZ: a fast compression and cloud transmission tool optimized for FASTQ files Xing, Yuting Li, Gen Wang, Zhenguo Feng, Bolun Song, Zhuo Wu, Chengkun BMC Bioinformatics Research BACKGROUND: The dramatic development of DNA sequencing technology is generating real big data, craving for more storage and bandwidth. To speed up data sharing and bring data to computing resource faster and cheaper, it is necessary to develop a compression tool than can support efficient compression and transmission of sequencing data onto the cloud storage. RESULTS: This paper presents GTZ, a compression and transmission tool, optimized for FASTQ files. As a reference-free lossless FASTQ compressor, GTZ treats different lines of FASTQ separately, utilizes adaptive context modelling to estimate their characteristic probabilities, and compresses data blocks with arithmetic coding. GTZ can also be used to compress multiple files or directories at once. Furthermore, as a tool to be used in the cloud computing era, it is capable of saving compressed data locally or transmitting data directly into cloud by choice. We evaluated the performance of GTZ on some diverse FASTQ benchmarks. Results show that in most cases, it outperforms many other tools in terms of the compression ratio, speed and stability. CONCLUSIONS: GTZ is a tool that enables efficient lossless FASTQ data compression and simultaneous data transmission onto to cloud. It emerges as a useful tool for NGS data storage and transmission in the cloud environment. GTZ is freely available online at: https://github.com/Genetalks/gtz. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-017-1973-5) contains supplementary material, which is available to authorized users. BioMed Central 2017-12-28 /pmc/articles/PMC5751770/ /pubmed/29297296 http://dx.doi.org/10.1186/s12859-017-1973-5 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Xing, Yuting
Li, Gen
Wang, Zhenguo
Feng, Bolun
Song, Zhuo
Wu, Chengkun
GTZ: a fast compression and cloud transmission tool optimized for FASTQ files
title GTZ: a fast compression and cloud transmission tool optimized for FASTQ files
title_full GTZ: a fast compression and cloud transmission tool optimized for FASTQ files
title_fullStr GTZ: a fast compression and cloud transmission tool optimized for FASTQ files
title_full_unstemmed GTZ: a fast compression and cloud transmission tool optimized for FASTQ files
title_short GTZ: a fast compression and cloud transmission tool optimized for FASTQ files
title_sort gtz: a fast compression and cloud transmission tool optimized for fastq files
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5751770/
https://www.ncbi.nlm.nih.gov/pubmed/29297296
http://dx.doi.org/10.1186/s12859-017-1973-5
work_keys_str_mv AT xingyuting gtzafastcompressionandcloudtransmissiontooloptimizedforfastqfiles
AT ligen gtzafastcompressionandcloudtransmissiontooloptimizedforfastqfiles
AT wangzhenguo gtzafastcompressionandcloudtransmissiontooloptimizedforfastqfiles
AT fengbolun gtzafastcompressionandcloudtransmissiontooloptimizedforfastqfiles
AT songzhuo gtzafastcompressionandcloudtransmissiontooloptimizedforfastqfiles
AT wuchengkun gtzafastcompressionandcloudtransmissiontooloptimizedforfastqfiles