Cargando…

LFastqC: A lossless non-reference-based FASTQ compressor

The cost-effectiveness of next-generation sequencing (NGS) has led to the advancement of genomic research, thereby regularly generating a large amount of raw data that often requires efficient infrastructures such as data centers to manage the storage and transmission of such data. The generated NGS...

Descripción completa

Detalles Bibliográficos
Autores principales: Al Yami, Sultan, Huang, Chun-Hsi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6855649/
https://www.ncbi.nlm.nih.gov/pubmed/31725736
http://dx.doi.org/10.1371/journal.pone.0224806
_version_ 1783470445603848192
author Al Yami, Sultan
Huang, Chun-Hsi
author_facet Al Yami, Sultan
Huang, Chun-Hsi
author_sort Al Yami, Sultan
collection PubMed
description The cost-effectiveness of next-generation sequencing (NGS) has led to the advancement of genomic research, thereby regularly generating a large amount of raw data that often requires efficient infrastructures such as data centers to manage the storage and transmission of such data. The generated NGS data are highly redundant and need to be efficiently compressed to reduce the cost of storage space and transmission bandwidth. We present a lossless, non-reference-based FASTQ compression algorithm, known as LFastqC, an improvement over the LFQC tool, to address these issues. LFastqC is compared with several state-of-the-art compressors, and the results indicate that LFastqC achieves better compression ratios for important datasets such as the LS454, PacBio, and MinION. Moreover, LFastqC has a better compression and decompression speed than LFQC, which was previously the top-performing compression algorithm for the LS454 dataset. LFastqC is freely available at https://github.uconn.edu/sya12005/LFastqC.
format Online
Article
Text
id pubmed-6855649
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-68556492019-12-06 LFastqC: A lossless non-reference-based FASTQ compressor Al Yami, Sultan Huang, Chun-Hsi PLoS One Research Article The cost-effectiveness of next-generation sequencing (NGS) has led to the advancement of genomic research, thereby regularly generating a large amount of raw data that often requires efficient infrastructures such as data centers to manage the storage and transmission of such data. The generated NGS data are highly redundant and need to be efficiently compressed to reduce the cost of storage space and transmission bandwidth. We present a lossless, non-reference-based FASTQ compression algorithm, known as LFastqC, an improvement over the LFQC tool, to address these issues. LFastqC is compared with several state-of-the-art compressors, and the results indicate that LFastqC achieves better compression ratios for important datasets such as the LS454, PacBio, and MinION. Moreover, LFastqC has a better compression and decompression speed than LFQC, which was previously the top-performing compression algorithm for the LS454 dataset. LFastqC is freely available at https://github.uconn.edu/sya12005/LFastqC. Public Library of Science 2019-11-14 /pmc/articles/PMC6855649/ /pubmed/31725736 http://dx.doi.org/10.1371/journal.pone.0224806 Text en © 2019 Al Yami, Huang http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Al Yami, Sultan
Huang, Chun-Hsi
LFastqC: A lossless non-reference-based FASTQ compressor
title LFastqC: A lossless non-reference-based FASTQ compressor
title_full LFastqC: A lossless non-reference-based FASTQ compressor
title_fullStr LFastqC: A lossless non-reference-based FASTQ compressor
title_full_unstemmed LFastqC: A lossless non-reference-based FASTQ compressor
title_short LFastqC: A lossless non-reference-based FASTQ compressor
title_sort lfastqc: a lossless non-reference-based fastq compressor
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6855649/
https://www.ncbi.nlm.nih.gov/pubmed/31725736
http://dx.doi.org/10.1371/journal.pone.0224806
work_keys_str_mv AT alyamisultan lfastqcalosslessnonreferencebasedfastqcompressor
AT huangchunhsi lfastqcalosslessnonreferencebasedfastqcompressor