Cargando…
BdBG: a bucket-based method for compressing genome sequencing data with dynamic de Bruijn graphs
Dramatic increases in data produced by next-generation sequencing (NGS) technologies demand data compression tools for saving storage space. However, effective and efficient data compression for genome sequencing data has remained an unresolved challenge in NGS data studies. In this paper, we propos...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6197042/ https://www.ncbi.nlm.nih.gov/pubmed/30364599 http://dx.doi.org/10.7717/peerj.5611 |
_version_ | 1783364678757384192 |
---|---|
author | Wang, Rongjie Li, Junyi Bai, Yang Zang, Tianyi Wang, Yadong |
author_facet | Wang, Rongjie Li, Junyi Bai, Yang Zang, Tianyi Wang, Yadong |
author_sort | Wang, Rongjie |
collection | PubMed |
description | Dramatic increases in data produced by next-generation sequencing (NGS) technologies demand data compression tools for saving storage space. However, effective and efficient data compression for genome sequencing data has remained an unresolved challenge in NGS data studies. In this paper, we propose a novel alignment-free and reference-free compression method, BdBG, which is the first to compress genome sequencing data with dynamic de Bruijn graphs based on the data after bucketing. Compared with existing de Bruijn graph methods, BdBG only stored a list of bucket indexes and bifurcations for the raw read sequences, and this feature can effectively reduce storage space. Experimental results on several genome sequencing datasets show the effectiveness of BdBG over three state-of-the-art methods. BdBG is written in python and it is an open source software distributed under the MIT license, available for download at https://github.com/rongjiewang/BdBG. |
format | Online Article Text |
id | pubmed-6197042 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-61970422018-10-24 BdBG: a bucket-based method for compressing genome sequencing data with dynamic de Bruijn graphs Wang, Rongjie Li, Junyi Bai, Yang Zang, Tianyi Wang, Yadong PeerJ Bioinformatics Dramatic increases in data produced by next-generation sequencing (NGS) technologies demand data compression tools for saving storage space. However, effective and efficient data compression for genome sequencing data has remained an unresolved challenge in NGS data studies. In this paper, we propose a novel alignment-free and reference-free compression method, BdBG, which is the first to compress genome sequencing data with dynamic de Bruijn graphs based on the data after bucketing. Compared with existing de Bruijn graph methods, BdBG only stored a list of bucket indexes and bifurcations for the raw read sequences, and this feature can effectively reduce storage space. Experimental results on several genome sequencing datasets show the effectiveness of BdBG over three state-of-the-art methods. BdBG is written in python and it is an open source software distributed under the MIT license, available for download at https://github.com/rongjiewang/BdBG. PeerJ Inc. 2018-10-19 /pmc/articles/PMC6197042/ /pubmed/30364599 http://dx.doi.org/10.7717/peerj.5611 Text en ©2018 Wang et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Wang, Rongjie Li, Junyi Bai, Yang Zang, Tianyi Wang, Yadong BdBG: a bucket-based method for compressing genome sequencing data with dynamic de Bruijn graphs |
title | BdBG: a bucket-based method for compressing genome sequencing data with dynamic de Bruijn graphs |
title_full | BdBG: a bucket-based method for compressing genome sequencing data with dynamic de Bruijn graphs |
title_fullStr | BdBG: a bucket-based method for compressing genome sequencing data with dynamic de Bruijn graphs |
title_full_unstemmed | BdBG: a bucket-based method for compressing genome sequencing data with dynamic de Bruijn graphs |
title_short | BdBG: a bucket-based method for compressing genome sequencing data with dynamic de Bruijn graphs |
title_sort | bdbg: a bucket-based method for compressing genome sequencing data with dynamic de bruijn graphs |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6197042/ https://www.ncbi.nlm.nih.gov/pubmed/30364599 http://dx.doi.org/10.7717/peerj.5611 |
work_keys_str_mv | AT wangrongjie bdbgabucketbasedmethodforcompressinggenomesequencingdatawithdynamicdebruijngraphs AT lijunyi bdbgabucketbasedmethodforcompressinggenomesequencingdatawithdynamicdebruijngraphs AT baiyang bdbgabucketbasedmethodforcompressinggenomesequencingdatawithdynamicdebruijngraphs AT zangtianyi bdbgabucketbasedmethodforcompressinggenomesequencingdatawithdynamicdebruijngraphs AT wangyadong bdbgabucketbasedmethodforcompressinggenomesequencingdatawithdynamicdebruijngraphs |