Cargando…

XSI—a genotype compression tool for compressive genomics in large biobanks

MOTIVATION: Generation of genotype data has been growing exponentially over the last decade. With the large size of recent datasets comes a storage and computational burden with ever increasing costs. To reduce this burden, we propose XSI, a file format with reduced storage footprint that also allow...

Descripción completa

Detalles Bibliográficos
Autores principales: Wertenbroek, Rick, Rubinacci, Simone, Xenarios, Ioannis, Thoma, Yann, Delaneau, Olivier
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9344850/
https://www.ncbi.nlm.nih.gov/pubmed/35748697
http://dx.doi.org/10.1093/bioinformatics/btac413
Descripción
Sumario:MOTIVATION: Generation of genotype data has been growing exponentially over the last decade. With the large size of recent datasets comes a storage and computational burden with ever increasing costs. To reduce this burden, we propose XSI, a file format with reduced storage footprint that also allows computation on the compressed data and we show how this can improve future analyses. RESULTS: We show that xSqueezeIt (XSI) allows for a file size reduction of [Formula: see text] compared with compressed BCF and demonstrate its potential for ‘compressive genomics’ on the UK Biobank whole-genome sequencing genotypes with [Formula: see text] faster loading times, [Formula: see text] faster run of homozygozity computation, [Formula: see text] faster dot products computation and [Formula: see text] faster allele counts. AVAILABILITY AND IMPLEMENTATION: The XSI file format specifications, API and command line tool are released under open-source (MIT) license and are available at https://github.com/rwk-unil/xSqueezeIt SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.