Cargando…
GTZ: a fast compression and cloud transmission tool optimized for FASTQ files
BACKGROUND: The dramatic development of DNA sequencing technology is generating real big data, craving for more storage and bandwidth. To speed up data sharing and bring data to computing resource faster and cheaper, it is necessary to develop a compression tool than can support efficient compressio...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5751770/ https://www.ncbi.nlm.nih.gov/pubmed/29297296 http://dx.doi.org/10.1186/s12859-017-1973-5 |
_version_ | 1783290014318198784 |
---|---|
author | Xing, Yuting Li, Gen Wang, Zhenguo Feng, Bolun Song, Zhuo Wu, Chengkun |
author_facet | Xing, Yuting Li, Gen Wang, Zhenguo Feng, Bolun Song, Zhuo Wu, Chengkun |
author_sort | Xing, Yuting |
collection | PubMed |
description | BACKGROUND: The dramatic development of DNA sequencing technology is generating real big data, craving for more storage and bandwidth. To speed up data sharing and bring data to computing resource faster and cheaper, it is necessary to develop a compression tool than can support efficient compression and transmission of sequencing data onto the cloud storage. RESULTS: This paper presents GTZ, a compression and transmission tool, optimized for FASTQ files. As a reference-free lossless FASTQ compressor, GTZ treats different lines of FASTQ separately, utilizes adaptive context modelling to estimate their characteristic probabilities, and compresses data blocks with arithmetic coding. GTZ can also be used to compress multiple files or directories at once. Furthermore, as a tool to be used in the cloud computing era, it is capable of saving compressed data locally or transmitting data directly into cloud by choice. We evaluated the performance of GTZ on some diverse FASTQ benchmarks. Results show that in most cases, it outperforms many other tools in terms of the compression ratio, speed and stability. CONCLUSIONS: GTZ is a tool that enables efficient lossless FASTQ data compression and simultaneous data transmission onto to cloud. It emerges as a useful tool for NGS data storage and transmission in the cloud environment. GTZ is freely available online at: https://github.com/Genetalks/gtz. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-017-1973-5) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5751770 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-57517702018-01-05 GTZ: a fast compression and cloud transmission tool optimized for FASTQ files Xing, Yuting Li, Gen Wang, Zhenguo Feng, Bolun Song, Zhuo Wu, Chengkun BMC Bioinformatics Research BACKGROUND: The dramatic development of DNA sequencing technology is generating real big data, craving for more storage and bandwidth. To speed up data sharing and bring data to computing resource faster and cheaper, it is necessary to develop a compression tool than can support efficient compression and transmission of sequencing data onto the cloud storage. RESULTS: This paper presents GTZ, a compression and transmission tool, optimized for FASTQ files. As a reference-free lossless FASTQ compressor, GTZ treats different lines of FASTQ separately, utilizes adaptive context modelling to estimate their characteristic probabilities, and compresses data blocks with arithmetic coding. GTZ can also be used to compress multiple files or directories at once. Furthermore, as a tool to be used in the cloud computing era, it is capable of saving compressed data locally or transmitting data directly into cloud by choice. We evaluated the performance of GTZ on some diverse FASTQ benchmarks. Results show that in most cases, it outperforms many other tools in terms of the compression ratio, speed and stability. CONCLUSIONS: GTZ is a tool that enables efficient lossless FASTQ data compression and simultaneous data transmission onto to cloud. It emerges as a useful tool for NGS data storage and transmission in the cloud environment. GTZ is freely available online at: https://github.com/Genetalks/gtz. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-017-1973-5) contains supplementary material, which is available to authorized users. BioMed Central 2017-12-28 /pmc/articles/PMC5751770/ /pubmed/29297296 http://dx.doi.org/10.1186/s12859-017-1973-5 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Xing, Yuting Li, Gen Wang, Zhenguo Feng, Bolun Song, Zhuo Wu, Chengkun GTZ: a fast compression and cloud transmission tool optimized for FASTQ files |
title | GTZ: a fast compression and cloud transmission tool optimized for FASTQ files |
title_full | GTZ: a fast compression and cloud transmission tool optimized for FASTQ files |
title_fullStr | GTZ: a fast compression and cloud transmission tool optimized for FASTQ files |
title_full_unstemmed | GTZ: a fast compression and cloud transmission tool optimized for FASTQ files |
title_short | GTZ: a fast compression and cloud transmission tool optimized for FASTQ files |
title_sort | gtz: a fast compression and cloud transmission tool optimized for fastq files |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5751770/ https://www.ncbi.nlm.nih.gov/pubmed/29297296 http://dx.doi.org/10.1186/s12859-017-1973-5 |
work_keys_str_mv | AT xingyuting gtzafastcompressionandcloudtransmissiontooloptimizedforfastqfiles AT ligen gtzafastcompressionandcloudtransmissiontooloptimizedforfastqfiles AT wangzhenguo gtzafastcompressionandcloudtransmissiontooloptimizedforfastqfiles AT fengbolun gtzafastcompressionandcloudtransmissiontooloptimizedforfastqfiles AT songzhuo gtzafastcompressionandcloudtransmissiontooloptimizedforfastqfiles AT wuchengkun gtzafastcompressionandcloudtransmissiontooloptimizedforfastqfiles |