Cargando…

HapZipper: sharing HapMap populations just got easier

The rapidly growing amount of genomic sequence data being generated and made publicly available necessitate the development of new data storage and archiving methods. The vast amount of data being shared and manipulated also create new challenges for network resources. Thus, developing advanced data...

Descripción completa

Detalles Bibliográficos
Autores principales: Chanda, Pritam, Elhaik, Eran, Bader, Joel S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3488212/
https://www.ncbi.nlm.nih.gov/pubmed/22844100
http://dx.doi.org/10.1093/nar/gks709
_version_ 1782248583037190144
author Chanda, Pritam
Elhaik, Eran
Bader, Joel S.
author_facet Chanda, Pritam
Elhaik, Eran
Bader, Joel S.
author_sort Chanda, Pritam
collection PubMed
description The rapidly growing amount of genomic sequence data being generated and made publicly available necessitate the development of new data storage and archiving methods. The vast amount of data being shared and manipulated also create new challenges for network resources. Thus, developing advanced data compression techniques is becoming an integral part of data production and analysis. The HapMap project is one of the largest public resources of human single-nucleotide polymorphisms (SNPs), characterizing over 3 million SNPs genotyped in over 1000 individuals. The standard format and biological properties of HapMap data suggest that a dedicated genetic compression method can outperform generic compression tools. We propose a compression methodology for genetic data by introducing HapZipper, a lossless compression tool tailored to compress HapMap data beyond benchmarks defined by generic tools such as gzip, bzip2 and lzma. We demonstrate the usefulness of HapZipper by compressing HapMap 3 populations to <5% of their original sizes. HapZipper is freely downloadable from https://bitbucket.org/pchanda/hapzipper/downloads/HapZipper.tar.bz2.
format Online
Article
Text
id pubmed-3488212
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-34882122012-11-06 HapZipper: sharing HapMap populations just got easier Chanda, Pritam Elhaik, Eran Bader, Joel S. Nucleic Acids Res Methods Online The rapidly growing amount of genomic sequence data being generated and made publicly available necessitate the development of new data storage and archiving methods. The vast amount of data being shared and manipulated also create new challenges for network resources. Thus, developing advanced data compression techniques is becoming an integral part of data production and analysis. The HapMap project is one of the largest public resources of human single-nucleotide polymorphisms (SNPs), characterizing over 3 million SNPs genotyped in over 1000 individuals. The standard format and biological properties of HapMap data suggest that a dedicated genetic compression method can outperform generic compression tools. We propose a compression methodology for genetic data by introducing HapZipper, a lossless compression tool tailored to compress HapMap data beyond benchmarks defined by generic tools such as gzip, bzip2 and lzma. We demonstrate the usefulness of HapZipper by compressing HapMap 3 populations to <5% of their original sizes. HapZipper is freely downloadable from https://bitbucket.org/pchanda/hapzipper/downloads/HapZipper.tar.bz2. Oxford University Press 2012-11 2012-07-27 /pmc/articles/PMC3488212/ /pubmed/22844100 http://dx.doi.org/10.1093/nar/gks709 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Chanda, Pritam
Elhaik, Eran
Bader, Joel S.
HapZipper: sharing HapMap populations just got easier
title HapZipper: sharing HapMap populations just got easier
title_full HapZipper: sharing HapMap populations just got easier
title_fullStr HapZipper: sharing HapMap populations just got easier
title_full_unstemmed HapZipper: sharing HapMap populations just got easier
title_short HapZipper: sharing HapMap populations just got easier
title_sort hapzipper: sharing hapmap populations just got easier
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3488212/
https://www.ncbi.nlm.nih.gov/pubmed/22844100
http://dx.doi.org/10.1093/nar/gks709
work_keys_str_mv AT chandapritam hapzippersharinghapmappopulationsjustgoteasier
AT elhaikeran hapzippersharinghapmappopulationsjustgoteasier
AT baderjoels hapzippersharinghapmappopulationsjustgoteasier