Cargando…

The Genomedata format for storing large-scale functional genomics data

Summary: We present a format for efficient storage of multiple tracks of numeric data anchored to a genome. The format allows fast random access to hundreds of gigabytes of data, while retaining a small disk space footprint. We have also developed utilities to load data into this format. We show tha...

Descripción completa

Detalles Bibliográficos
Autores principales: Hoffman, Michael M., Buske, Orion J., Noble, William Stafford
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2872006/
https://www.ncbi.nlm.nih.gov/pubmed/20435580
http://dx.doi.org/10.1093/bioinformatics/btq164
_version_ 1782181193661284352
author Hoffman, Michael M.
Buske, Orion J.
Noble, William Stafford
author_facet Hoffman, Michael M.
Buske, Orion J.
Noble, William Stafford
author_sort Hoffman, Michael M.
collection PubMed
description Summary: We present a format for efficient storage of multiple tracks of numeric data anchored to a genome. The format allows fast random access to hundreds of gigabytes of data, while retaining a small disk space footprint. We have also developed utilities to load data into this format. We show that retrieving data from this format is more than 2900 times faster than a naive approach using wiggle files. Availability and Implementation: Reference implementation in Python and C components available at http://noble.gs.washington.edu/proj/genomedata/ under the GNU General Public License. Contact: william-noble@uw.edu
format Text
id pubmed-2872006
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-28720062010-05-24 The Genomedata format for storing large-scale functional genomics data Hoffman, Michael M. Buske, Orion J. Noble, William Stafford Bioinformatics Applications Note Summary: We present a format for efficient storage of multiple tracks of numeric data anchored to a genome. The format allows fast random access to hundreds of gigabytes of data, while retaining a small disk space footprint. We have also developed utilities to load data into this format. We show that retrieving data from this format is more than 2900 times faster than a naive approach using wiggle files. Availability and Implementation: Reference implementation in Python and C components available at http://noble.gs.washington.edu/proj/genomedata/ under the GNU General Public License. Contact: william-noble@uw.edu Oxford University Press 2010-06-01 2010-04-29 /pmc/articles/PMC2872006/ /pubmed/20435580 http://dx.doi.org/10.1093/bioinformatics/btq164 Text en © The Author 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Note
Hoffman, Michael M.
Buske, Orion J.
Noble, William Stafford
The Genomedata format for storing large-scale functional genomics data
title The Genomedata format for storing large-scale functional genomics data
title_full The Genomedata format for storing large-scale functional genomics data
title_fullStr The Genomedata format for storing large-scale functional genomics data
title_full_unstemmed The Genomedata format for storing large-scale functional genomics data
title_short The Genomedata format for storing large-scale functional genomics data
title_sort genomedata format for storing large-scale functional genomics data
topic Applications Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2872006/
https://www.ncbi.nlm.nih.gov/pubmed/20435580
http://dx.doi.org/10.1093/bioinformatics/btq164
work_keys_str_mv AT hoffmanmichaelm thegenomedataformatforstoringlargescalefunctionalgenomicsdata
AT buskeorionj thegenomedataformatforstoringlargescalefunctionalgenomicsdata
AT noblewilliamstafford thegenomedataformatforstoringlargescalefunctionalgenomicsdata
AT hoffmanmichaelm genomedataformatforstoringlargescalefunctionalgenomicsdata
AT buskeorionj genomedataformatforstoringlargescalefunctionalgenomicsdata
AT noblewilliamstafford genomedataformatforstoringlargescalefunctionalgenomicsdata