Cargando…

Compression of next-generation sequencing quality scores using memetic algorithm

BACKGROUND: The exponential growth of next-generation sequencing (NGS) derived DNA data poses great challenges to data storage and transmission. Although many compression algorithms have been proposed for DNA reads in NGS data, few methods are designed specifically to handle the quality scores. RESU...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Jiarui, Ji, Zhen, Zhu, Zexuan, He, Shan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271560/
https://www.ncbi.nlm.nih.gov/pubmed/25474747
http://dx.doi.org/10.1186/1471-2105-15-S15-S10
_version_ 1782349627505246208
author Zhou, Jiarui
Ji, Zhen
Zhu, Zexuan
He, Shan
author_facet Zhou, Jiarui
Ji, Zhen
Zhu, Zexuan
He, Shan
author_sort Zhou, Jiarui
collection PubMed
description BACKGROUND: The exponential growth of next-generation sequencing (NGS) derived DNA data poses great challenges to data storage and transmission. Although many compression algorithms have been proposed for DNA reads in NGS data, few methods are designed specifically to handle the quality scores. RESULTS: In this paper we present a memetic algorithm (MA) based NGS quality score data compressor, namely MMQSC. The algorithm extracts raw quality score sequences from FASTQ formatted files, and designs compression codebook using MA based multimodal optimization. The input data is then compressed in a substitutional manner. Experimental results on five representative NGS data sets show that MMQSC obtains higher compression ratio than the other state-of-the-art methods. Particularly, MMQSC is a lossless reference-free compression algorithm, yet obtains an average compression ratio of 22.82% on the experimental data sets. CONCLUSIONS: The proposed MMQSC compresses NGS quality score data effectively. It can be utilized to improve the overall compression ratio on FASTQ formatted files.
format Online
Article
Text
id pubmed-4271560
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42715602015-01-02 Compression of next-generation sequencing quality scores using memetic algorithm Zhou, Jiarui Ji, Zhen Zhu, Zexuan He, Shan BMC Bioinformatics Proceedings BACKGROUND: The exponential growth of next-generation sequencing (NGS) derived DNA data poses great challenges to data storage and transmission. Although many compression algorithms have been proposed for DNA reads in NGS data, few methods are designed specifically to handle the quality scores. RESULTS: In this paper we present a memetic algorithm (MA) based NGS quality score data compressor, namely MMQSC. The algorithm extracts raw quality score sequences from FASTQ formatted files, and designs compression codebook using MA based multimodal optimization. The input data is then compressed in a substitutional manner. Experimental results on five representative NGS data sets show that MMQSC obtains higher compression ratio than the other state-of-the-art methods. Particularly, MMQSC is a lossless reference-free compression algorithm, yet obtains an average compression ratio of 22.82% on the experimental data sets. CONCLUSIONS: The proposed MMQSC compresses NGS quality score data effectively. It can be utilized to improve the overall compression ratio on FASTQ formatted files. BioMed Central 2014-12-03 /pmc/articles/PMC4271560/ /pubmed/25474747 http://dx.doi.org/10.1186/1471-2105-15-S15-S10 Text en Copyright © 2014 Zhou et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Zhou, Jiarui
Ji, Zhen
Zhu, Zexuan
He, Shan
Compression of next-generation sequencing quality scores using memetic algorithm
title Compression of next-generation sequencing quality scores using memetic algorithm
title_full Compression of next-generation sequencing quality scores using memetic algorithm
title_fullStr Compression of next-generation sequencing quality scores using memetic algorithm
title_full_unstemmed Compression of next-generation sequencing quality scores using memetic algorithm
title_short Compression of next-generation sequencing quality scores using memetic algorithm
title_sort compression of next-generation sequencing quality scores using memetic algorithm
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271560/
https://www.ncbi.nlm.nih.gov/pubmed/25474747
http://dx.doi.org/10.1186/1471-2105-15-S15-S10
work_keys_str_mv AT zhoujiarui compressionofnextgenerationsequencingqualityscoresusingmemeticalgorithm
AT jizhen compressionofnextgenerationsequencingqualityscoresusingmemeticalgorithm
AT zhuzexuan compressionofnextgenerationsequencingqualityscoresusingmemeticalgorithm
AT heshan compressionofnextgenerationsequencingqualityscoresusingmemeticalgorithm