Cargando…
HapZipper: sharing HapMap populations just got easier
The rapidly growing amount of genomic sequence data being generated and made publicly available necessitate the development of new data storage and archiving methods. The vast amount of data being shared and manipulated also create new challenges for network resources. Thus, developing advanced data...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3488212/ https://www.ncbi.nlm.nih.gov/pubmed/22844100 http://dx.doi.org/10.1093/nar/gks709 |
_version_ | 1782248583037190144 |
---|---|
author | Chanda, Pritam Elhaik, Eran Bader, Joel S. |
author_facet | Chanda, Pritam Elhaik, Eran Bader, Joel S. |
author_sort | Chanda, Pritam |
collection | PubMed |
description | The rapidly growing amount of genomic sequence data being generated and made publicly available necessitate the development of new data storage and archiving methods. The vast amount of data being shared and manipulated also create new challenges for network resources. Thus, developing advanced data compression techniques is becoming an integral part of data production and analysis. The HapMap project is one of the largest public resources of human single-nucleotide polymorphisms (SNPs), characterizing over 3 million SNPs genotyped in over 1000 individuals. The standard format and biological properties of HapMap data suggest that a dedicated genetic compression method can outperform generic compression tools. We propose a compression methodology for genetic data by introducing HapZipper, a lossless compression tool tailored to compress HapMap data beyond benchmarks defined by generic tools such as gzip, bzip2 and lzma. We demonstrate the usefulness of HapZipper by compressing HapMap 3 populations to <5% of their original sizes. HapZipper is freely downloadable from https://bitbucket.org/pchanda/hapzipper/downloads/HapZipper.tar.bz2. |
format | Online Article Text |
id | pubmed-3488212 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-34882122012-11-06 HapZipper: sharing HapMap populations just got easier Chanda, Pritam Elhaik, Eran Bader, Joel S. Nucleic Acids Res Methods Online The rapidly growing amount of genomic sequence data being generated and made publicly available necessitate the development of new data storage and archiving methods. The vast amount of data being shared and manipulated also create new challenges for network resources. Thus, developing advanced data compression techniques is becoming an integral part of data production and analysis. The HapMap project is one of the largest public resources of human single-nucleotide polymorphisms (SNPs), characterizing over 3 million SNPs genotyped in over 1000 individuals. The standard format and biological properties of HapMap data suggest that a dedicated genetic compression method can outperform generic compression tools. We propose a compression methodology for genetic data by introducing HapZipper, a lossless compression tool tailored to compress HapMap data beyond benchmarks defined by generic tools such as gzip, bzip2 and lzma. We demonstrate the usefulness of HapZipper by compressing HapMap 3 populations to <5% of their original sizes. HapZipper is freely downloadable from https://bitbucket.org/pchanda/hapzipper/downloads/HapZipper.tar.bz2. Oxford University Press 2012-11 2012-07-27 /pmc/articles/PMC3488212/ /pubmed/22844100 http://dx.doi.org/10.1093/nar/gks709 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Chanda, Pritam Elhaik, Eran Bader, Joel S. HapZipper: sharing HapMap populations just got easier |
title | HapZipper: sharing HapMap populations just got easier |
title_full | HapZipper: sharing HapMap populations just got easier |
title_fullStr | HapZipper: sharing HapMap populations just got easier |
title_full_unstemmed | HapZipper: sharing HapMap populations just got easier |
title_short | HapZipper: sharing HapMap populations just got easier |
title_sort | hapzipper: sharing hapmap populations just got easier |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3488212/ https://www.ncbi.nlm.nih.gov/pubmed/22844100 http://dx.doi.org/10.1093/nar/gks709 |
work_keys_str_mv | AT chandapritam hapzippersharinghapmappopulationsjustgoteasier AT elhaikeran hapzippersharinghapmappopulationsjustgoteasier AT baderjoels hapzippersharinghapmappopulationsjustgoteasier |