Cargando…

MAFCO: A Compression Tool for MAF Files

In the last decade, the cost of genomic sequencing has been decreasing so much that researchers all over the world accumulate huge amounts of data for present and future use. These genomic data need to be efficiently stored, because storage cost is not decreasing as fast as the cost of sequencing. I...

Descripción completa

Detalles Bibliográficos
Autores principales: Matos, Luís M. O., Neves, António J. R., Pratas, Diogo, Pinho, Armando J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4376647/
https://www.ncbi.nlm.nih.gov/pubmed/25816229
http://dx.doi.org/10.1371/journal.pone.0116082
_version_ 1782363762544607232
author Matos, Luís M. O.
Neves, António J. R.
Pratas, Diogo
Pinho, Armando J.
author_facet Matos, Luís M. O.
Neves, António J. R.
Pratas, Diogo
Pinho, Armando J.
author_sort Matos, Luís M. O.
collection PubMed
description In the last decade, the cost of genomic sequencing has been decreasing so much that researchers all over the world accumulate huge amounts of data for present and future use. These genomic data need to be efficiently stored, because storage cost is not decreasing as fast as the cost of sequencing. In order to overcome this problem, the most popular general-purpose compression tool, gzip, is usually used. However, these tools were not specifically designed to compress this kind of data, and often fall short when the intention is to reduce the data size as much as possible. There are several compression algorithms available, even for genomic data, but very few have been designed to deal with Whole Genome Alignments, containing alignments between entire genomes of several species. In this paper, we present a lossless compression tool, MAFCO, specifically designed to compress MAF (Multiple Alignment Format) files. Compared to gzip, the proposed tool attains a compression gain from 34% to 57%, depending on the data set. When compared to a recent dedicated method, which is not compatible with some data sets, the compression gain of MAFCO is about 9%. Both source-code and binaries for several operating systems are freely available for non-commercial use at: http://bioinformatics.ua.pt/software/mafco.
format Online
Article
Text
id pubmed-4376647
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-43766472015-04-04 MAFCO: A Compression Tool for MAF Files Matos, Luís M. O. Neves, António J. R. Pratas, Diogo Pinho, Armando J. PLoS One Research Article In the last decade, the cost of genomic sequencing has been decreasing so much that researchers all over the world accumulate huge amounts of data for present and future use. These genomic data need to be efficiently stored, because storage cost is not decreasing as fast as the cost of sequencing. In order to overcome this problem, the most popular general-purpose compression tool, gzip, is usually used. However, these tools were not specifically designed to compress this kind of data, and often fall short when the intention is to reduce the data size as much as possible. There are several compression algorithms available, even for genomic data, but very few have been designed to deal with Whole Genome Alignments, containing alignments between entire genomes of several species. In this paper, we present a lossless compression tool, MAFCO, specifically designed to compress MAF (Multiple Alignment Format) files. Compared to gzip, the proposed tool attains a compression gain from 34% to 57%, depending on the data set. When compared to a recent dedicated method, which is not compatible with some data sets, the compression gain of MAFCO is about 9%. Both source-code and binaries for several operating systems are freely available for non-commercial use at: http://bioinformatics.ua.pt/software/mafco. Public Library of Science 2015-03-27 /pmc/articles/PMC4376647/ /pubmed/25816229 http://dx.doi.org/10.1371/journal.pone.0116082 Text en © 2015 Matos et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Matos, Luís M. O.
Neves, António J. R.
Pratas, Diogo
Pinho, Armando J.
MAFCO: A Compression Tool for MAF Files
title MAFCO: A Compression Tool for MAF Files
title_full MAFCO: A Compression Tool for MAF Files
title_fullStr MAFCO: A Compression Tool for MAF Files
title_full_unstemmed MAFCO: A Compression Tool for MAF Files
title_short MAFCO: A Compression Tool for MAF Files
title_sort mafco: a compression tool for maf files
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4376647/
https://www.ncbi.nlm.nih.gov/pubmed/25816229
http://dx.doi.org/10.1371/journal.pone.0116082
work_keys_str_mv AT matosluismo mafcoacompressiontoolformaffiles
AT nevesantoniojr mafcoacompressiontoolformaffiles
AT pratasdiogo mafcoacompressiontoolformaffiles
AT pinhoarmandoj mafcoacompressiontoolformaffiles