Cargando…
XSI—a genotype compression tool for compressive genomics in large biobanks
MOTIVATION: Generation of genotype data has been growing exponentially over the last decade. With the large size of recent datasets comes a storage and computational burden with ever increasing costs. To reduce this burden, we propose XSI, a file format with reduced storage footprint that also allow...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9344850/ https://www.ncbi.nlm.nih.gov/pubmed/35748697 http://dx.doi.org/10.1093/bioinformatics/btac413 |
Sumario: | MOTIVATION: Generation of genotype data has been growing exponentially over the last decade. With the large size of recent datasets comes a storage and computational burden with ever increasing costs. To reduce this burden, we propose XSI, a file format with reduced storage footprint that also allows computation on the compressed data and we show how this can improve future analyses. RESULTS: We show that xSqueezeIt (XSI) allows for a file size reduction of [Formula: see text] compared with compressed BCF and demonstrate its potential for ‘compressive genomics’ on the UK Biobank whole-genome sequencing genotypes with [Formula: see text] faster loading times, [Formula: see text] faster run of homozygozity computation, [Formula: see text] faster dot products computation and [Formula: see text] faster allele counts. AVAILABILITY AND IMPLEMENTATION: The XSI file format specifications, API and command line tool are released under open-source (MIT) license and are available at https://github.com/rwk-unil/xSqueezeIt SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
---|