Cargando…

MFCompress: a compression tool for FASTA and multi-FASTA data

Motivation: The data deluge phenomenon is becoming a serious problem in most genomic centers. To alleviate it, general purpose tools, such as gzip, are used to compress the data. However, although pervasive and easy to use, these tools fall short when the intention is to reduce as much as possible t...

Descripción completa

Detalles Bibliográficos
Autores principales: Pinho, Armando J., Pratas, Diogo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866555/
https://www.ncbi.nlm.nih.gov/pubmed/24132931
http://dx.doi.org/10.1093/bioinformatics/btt594
_version_ 1782296178604376064
author Pinho, Armando J.
Pratas, Diogo
author_facet Pinho, Armando J.
Pratas, Diogo
author_sort Pinho, Armando J.
collection PubMed
description Motivation: The data deluge phenomenon is becoming a serious problem in most genomic centers. To alleviate it, general purpose tools, such as gzip, are used to compress the data. However, although pervasive and easy to use, these tools fall short when the intention is to reduce as much as possible the data, for example, for medium- and long-term storage. A number of algorithms have been proposed for the compression of genomics data, but unfortunately only a few of them have been made available as usable and reliable compression tools. Results: In this article, we describe one such tool, MFCompress, specially designed for the compression of FASTA and multi-FASTA files. In comparison to gzip and applied to multi-FASTA files, MFCompress can provide additional average compression gains of almost 50%, i.e. it potentially doubles the available storage, although at the cost of some more computation time. On highly redundant datasets, and in comparison with gzip, 8-fold size reductions have been obtained. Availability: Both source code and binaries for several operating systems are freely available for non-commercial use at http://bioinformatics.ua.pt/software/mfcompress/. Contact: ap@ua.pt Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-3866555
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-38665552013-12-18 MFCompress: a compression tool for FASTA and multi-FASTA data Pinho, Armando J. Pratas, Diogo Bioinformatics Applications Notes Motivation: The data deluge phenomenon is becoming a serious problem in most genomic centers. To alleviate it, general purpose tools, such as gzip, are used to compress the data. However, although pervasive and easy to use, these tools fall short when the intention is to reduce as much as possible the data, for example, for medium- and long-term storage. A number of algorithms have been proposed for the compression of genomics data, but unfortunately only a few of them have been made available as usable and reliable compression tools. Results: In this article, we describe one such tool, MFCompress, specially designed for the compression of FASTA and multi-FASTA files. In comparison to gzip and applied to multi-FASTA files, MFCompress can provide additional average compression gains of almost 50%, i.e. it potentially doubles the available storage, although at the cost of some more computation time. On highly redundant datasets, and in comparison with gzip, 8-fold size reductions have been obtained. Availability: Both source code and binaries for several operating systems are freely available for non-commercial use at http://bioinformatics.ua.pt/software/mfcompress/. Contact: ap@ua.pt Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2014-01-01 2013-10-16 /pmc/articles/PMC3866555/ /pubmed/24132931 http://dx.doi.org/10.1093/bioinformatics/btt594 Text en © The Author 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Applications Notes
Pinho, Armando J.
Pratas, Diogo
MFCompress: a compression tool for FASTA and multi-FASTA data
title MFCompress: a compression tool for FASTA and multi-FASTA data
title_full MFCompress: a compression tool for FASTA and multi-FASTA data
title_fullStr MFCompress: a compression tool for FASTA and multi-FASTA data
title_full_unstemmed MFCompress: a compression tool for FASTA and multi-FASTA data
title_short MFCompress: a compression tool for FASTA and multi-FASTA data
title_sort mfcompress: a compression tool for fasta and multi-fasta data
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866555/
https://www.ncbi.nlm.nih.gov/pubmed/24132931
http://dx.doi.org/10.1093/bioinformatics/btt594
work_keys_str_mv AT pinhoarmandoj mfcompressacompressiontoolforfastaandmultifastadata
AT pratasdiogo mfcompressacompressiontoolforfastaandmultifastadata