Cargando…

A compression method for DNA

The development of high-throughput sequencing technology has generated huge amounts DNA data. Many general compression algorithms are not ideal for compressing DNA data, such as the LZ77 algorithm. On the basis of Nour and Sharawi’s method,we propose a new, lossless and reference-free method to incr...

Descripción completa

Detalles Bibliográficos
Autores principales: Du, Shengwang, Li, Junyi, Bian, Naizheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7688149/
https://www.ncbi.nlm.nih.gov/pubmed/33237908
http://dx.doi.org/10.1371/journal.pone.0238220
_version_ 1783613652831567872
author Du, Shengwang
Li, Junyi
Bian, Naizheng
author_facet Du, Shengwang
Li, Junyi
Bian, Naizheng
author_sort Du, Shengwang
collection PubMed
description The development of high-throughput sequencing technology has generated huge amounts DNA data. Many general compression algorithms are not ideal for compressing DNA data, such as the LZ77 algorithm. On the basis of Nour and Sharawi’s method,we propose a new, lossless and reference-free method to increase the compression performance. The original sequences are converted into eight intermediate files and six final files. Then, the LZ77 algorithm is used to compress the six final files. The results show that the compression time is decreased by 83% and the decompression time is decreased by 54% on average.The compression rate is almost the same as Nour and Sharawi’s method which is the fastest method so far. What’s more, our method has a wider range of application than Nour and Sharawi’s method. Compared to some very advanced compression tools at present, such as XM and FCM-Mx, the time for compression in our method is much smaller, on average decreasing the time by more than 90%.
format Online
Article
Text
id pubmed-7688149
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-76881492020-12-05 A compression method for DNA Du, Shengwang Li, Junyi Bian, Naizheng PLoS One Research Article The development of high-throughput sequencing technology has generated huge amounts DNA data. Many general compression algorithms are not ideal for compressing DNA data, such as the LZ77 algorithm. On the basis of Nour and Sharawi’s method,we propose a new, lossless and reference-free method to increase the compression performance. The original sequences are converted into eight intermediate files and six final files. Then, the LZ77 algorithm is used to compress the six final files. The results show that the compression time is decreased by 83% and the decompression time is decreased by 54% on average.The compression rate is almost the same as Nour and Sharawi’s method which is the fastest method so far. What’s more, our method has a wider range of application than Nour and Sharawi’s method. Compared to some very advanced compression tools at present, such as XM and FCM-Mx, the time for compression in our method is much smaller, on average decreasing the time by more than 90%. Public Library of Science 2020-11-25 /pmc/articles/PMC7688149/ /pubmed/33237908 http://dx.doi.org/10.1371/journal.pone.0238220 Text en © 2020 Du et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Du, Shengwang
Li, Junyi
Bian, Naizheng
A compression method for DNA
title A compression method for DNA
title_full A compression method for DNA
title_fullStr A compression method for DNA
title_full_unstemmed A compression method for DNA
title_short A compression method for DNA
title_sort compression method for dna
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7688149/
https://www.ncbi.nlm.nih.gov/pubmed/33237908
http://dx.doi.org/10.1371/journal.pone.0238220
work_keys_str_mv AT dushengwang acompressionmethodfordna
AT lijunyi acompressionmethodfordna
AT biannaizheng acompressionmethodfordna
AT dushengwang compressionmethodfordna
AT lijunyi compressionmethodfordna
AT biannaizheng compressionmethodfordna