Cargando…
MFCompress: a compression tool for FASTA and multi-FASTA data
Motivation: The data deluge phenomenon is becoming a serious problem in most genomic centers. To alleviate it, general purpose tools, such as gzip, are used to compress the data. However, although pervasive and easy to use, these tools fall short when the intention is to reduce as much as possible t...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866555/ https://www.ncbi.nlm.nih.gov/pubmed/24132931 http://dx.doi.org/10.1093/bioinformatics/btt594 |
_version_ | 1782296178604376064 |
---|---|
author | Pinho, Armando J. Pratas, Diogo |
author_facet | Pinho, Armando J. Pratas, Diogo |
author_sort | Pinho, Armando J. |
collection | PubMed |
description | Motivation: The data deluge phenomenon is becoming a serious problem in most genomic centers. To alleviate it, general purpose tools, such as gzip, are used to compress the data. However, although pervasive and easy to use, these tools fall short when the intention is to reduce as much as possible the data, for example, for medium- and long-term storage. A number of algorithms have been proposed for the compression of genomics data, but unfortunately only a few of them have been made available as usable and reliable compression tools. Results: In this article, we describe one such tool, MFCompress, specially designed for the compression of FASTA and multi-FASTA files. In comparison to gzip and applied to multi-FASTA files, MFCompress can provide additional average compression gains of almost 50%, i.e. it potentially doubles the available storage, although at the cost of some more computation time. On highly redundant datasets, and in comparison with gzip, 8-fold size reductions have been obtained. Availability: Both source code and binaries for several operating systems are freely available for non-commercial use at http://bioinformatics.ua.pt/software/mfcompress/. Contact: ap@ua.pt Supplementary information: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-3866555 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-38665552013-12-18 MFCompress: a compression tool for FASTA and multi-FASTA data Pinho, Armando J. Pratas, Diogo Bioinformatics Applications Notes Motivation: The data deluge phenomenon is becoming a serious problem in most genomic centers. To alleviate it, general purpose tools, such as gzip, are used to compress the data. However, although pervasive and easy to use, these tools fall short when the intention is to reduce as much as possible the data, for example, for medium- and long-term storage. A number of algorithms have been proposed for the compression of genomics data, but unfortunately only a few of them have been made available as usable and reliable compression tools. Results: In this article, we describe one such tool, MFCompress, specially designed for the compression of FASTA and multi-FASTA files. In comparison to gzip and applied to multi-FASTA files, MFCompress can provide additional average compression gains of almost 50%, i.e. it potentially doubles the available storage, although at the cost of some more computation time. On highly redundant datasets, and in comparison with gzip, 8-fold size reductions have been obtained. Availability: Both source code and binaries for several operating systems are freely available for non-commercial use at http://bioinformatics.ua.pt/software/mfcompress/. Contact: ap@ua.pt Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2014-01-01 2013-10-16 /pmc/articles/PMC3866555/ /pubmed/24132931 http://dx.doi.org/10.1093/bioinformatics/btt594 Text en © The Author 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Applications Notes Pinho, Armando J. Pratas, Diogo MFCompress: a compression tool for FASTA and multi-FASTA data |
title | MFCompress: a compression tool for FASTA and multi-FASTA data |
title_full | MFCompress: a compression tool for FASTA and multi-FASTA data |
title_fullStr | MFCompress: a compression tool for FASTA and multi-FASTA data |
title_full_unstemmed | MFCompress: a compression tool for FASTA and multi-FASTA data |
title_short | MFCompress: a compression tool for FASTA and multi-FASTA data |
title_sort | mfcompress: a compression tool for fasta and multi-fasta data |
topic | Applications Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866555/ https://www.ncbi.nlm.nih.gov/pubmed/24132931 http://dx.doi.org/10.1093/bioinformatics/btt594 |
work_keys_str_mv | AT pinhoarmandoj mfcompressacompressiontoolforfastaandmultifastadata AT pratasdiogo mfcompressacompressiontoolforfastaandmultifastadata |