Cargando…
DNA-COMPACT: DNA COMpression Based on a Pattern-Aware Contextual Modeling Technique
Genome data are becoming increasingly important for modern medicine. As the rate of increase in DNA sequencing outstrips the rate of increase in disk storage capacity, the storage and data transferring of large genome data are becoming important concerns for biomedical researchers. We propose a two-...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3840021/ https://www.ncbi.nlm.nih.gov/pubmed/24282536 http://dx.doi.org/10.1371/journal.pone.0080377 |
_version_ | 1782478470275661824 |
---|---|
author | Li, Pinghao Wang, Shuang Kim, Jihoon Xiong, Hongkai Ohno-Machado, Lucila Jiang, Xiaoqian |
author_facet | Li, Pinghao Wang, Shuang Kim, Jihoon Xiong, Hongkai Ohno-Machado, Lucila Jiang, Xiaoqian |
author_sort | Li, Pinghao |
collection | PubMed |
description | Genome data are becoming increasingly important for modern medicine. As the rate of increase in DNA sequencing outstrips the rate of increase in disk storage capacity, the storage and data transferring of large genome data are becoming important concerns for biomedical researchers. We propose a two-pass lossless genome compression algorithm, which highlights the synthesis of complementary contextual models, to improve the compression performance. The proposed framework could handle genome compression with and without reference sequences, and demonstrated performance advantages over best existing algorithms. The method for reference-free compression led to bit rates of 1.720 and 1.838 bits per base for bacteria and yeast, which were approximately 3.7% and 2.6% better than the state-of-the-art algorithms. Regarding performance with reference, we tested on the first Korean personal genome sequence data set, and our proposed method demonstrated a 189-fold compression rate, reducing the raw file size from 2986.8 MB to 15.8 MB at a comparable decompression cost with existing algorithms. DNAcompact is freely available at https://sourceforge.net/projects/dnacompact/for research purpose. |
format | Online Article Text |
id | pubmed-3840021 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-38400212013-11-26 DNA-COMPACT: DNA COMpression Based on a Pattern-Aware Contextual Modeling Technique Li, Pinghao Wang, Shuang Kim, Jihoon Xiong, Hongkai Ohno-Machado, Lucila Jiang, Xiaoqian PLoS One Research Article Genome data are becoming increasingly important for modern medicine. As the rate of increase in DNA sequencing outstrips the rate of increase in disk storage capacity, the storage and data transferring of large genome data are becoming important concerns for biomedical researchers. We propose a two-pass lossless genome compression algorithm, which highlights the synthesis of complementary contextual models, to improve the compression performance. The proposed framework could handle genome compression with and without reference sequences, and demonstrated performance advantages over best existing algorithms. The method for reference-free compression led to bit rates of 1.720 and 1.838 bits per base for bacteria and yeast, which were approximately 3.7% and 2.6% better than the state-of-the-art algorithms. Regarding performance with reference, we tested on the first Korean personal genome sequence data set, and our proposed method demonstrated a 189-fold compression rate, reducing the raw file size from 2986.8 MB to 15.8 MB at a comparable decompression cost with existing algorithms. DNAcompact is freely available at https://sourceforge.net/projects/dnacompact/for research purpose. Public Library of Science 2013-11-25 /pmc/articles/PMC3840021/ /pubmed/24282536 http://dx.doi.org/10.1371/journal.pone.0080377 Text en © 2013 Li et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Li, Pinghao Wang, Shuang Kim, Jihoon Xiong, Hongkai Ohno-Machado, Lucila Jiang, Xiaoqian DNA-COMPACT: DNA COMpression Based on a Pattern-Aware Contextual Modeling Technique |
title | DNA-COMPACT: DNA COMpression Based on a Pattern-Aware Contextual Modeling Technique |
title_full | DNA-COMPACT: DNA COMpression Based on a Pattern-Aware Contextual Modeling Technique |
title_fullStr | DNA-COMPACT: DNA COMpression Based on a Pattern-Aware Contextual Modeling Technique |
title_full_unstemmed | DNA-COMPACT: DNA COMpression Based on a Pattern-Aware Contextual Modeling Technique |
title_short | DNA-COMPACT: DNA COMpression Based on a Pattern-Aware Contextual Modeling Technique |
title_sort | dna-compact: dna compression based on a pattern-aware contextual modeling technique |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3840021/ https://www.ncbi.nlm.nih.gov/pubmed/24282536 http://dx.doi.org/10.1371/journal.pone.0080377 |
work_keys_str_mv | AT lipinghao dnacompactdnacompressionbasedonapatternawarecontextualmodelingtechnique AT wangshuang dnacompactdnacompressionbasedonapatternawarecontextualmodelingtechnique AT kimjihoon dnacompactdnacompressionbasedonapatternawarecontextualmodelingtechnique AT xionghongkai dnacompactdnacompressionbasedonapatternawarecontextualmodelingtechnique AT ohnomachadolucila dnacompactdnacompressionbasedonapatternawarecontextualmodelingtechnique AT jiangxiaoqian dnacompactdnacompressionbasedonapatternawarecontextualmodelingtechnique |