Cargando…

XSI—a genotype compression tool for compressive genomics in large biobanks

MOTIVATION: Generation of genotype data has been growing exponentially over the last decade. With the large size of recent datasets comes a storage and computational burden with ever increasing costs. To reduce this burden, we propose XSI, a file format with reduced storage footprint that also allow...

Descripción completa

Detalles Bibliográficos
Autores principales: Wertenbroek, Rick, Rubinacci, Simone, Xenarios, Ioannis, Thoma, Yann, Delaneau, Olivier
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9344850/
https://www.ncbi.nlm.nih.gov/pubmed/35748697
http://dx.doi.org/10.1093/bioinformatics/btac413
_version_ 1784761305432850432
author Wertenbroek, Rick
Rubinacci, Simone
Xenarios, Ioannis
Thoma, Yann
Delaneau, Olivier
author_facet Wertenbroek, Rick
Rubinacci, Simone
Xenarios, Ioannis
Thoma, Yann
Delaneau, Olivier
author_sort Wertenbroek, Rick
collection PubMed
description MOTIVATION: Generation of genotype data has been growing exponentially over the last decade. With the large size of recent datasets comes a storage and computational burden with ever increasing costs. To reduce this burden, we propose XSI, a file format with reduced storage footprint that also allows computation on the compressed data and we show how this can improve future analyses. RESULTS: We show that xSqueezeIt (XSI) allows for a file size reduction of [Formula: see text] compared with compressed BCF and demonstrate its potential for ‘compressive genomics’ on the UK Biobank whole-genome sequencing genotypes with [Formula: see text] faster loading times, [Formula: see text] faster run of homozygozity computation, [Formula: see text] faster dot products computation and [Formula: see text] faster allele counts. AVAILABILITY AND IMPLEMENTATION: The XSI file format specifications, API and command line tool are released under open-source (MIT) license and are available at https://github.com/rwk-unil/xSqueezeIt SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9344850
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-93448502022-08-03 XSI—a genotype compression tool for compressive genomics in large biobanks Wertenbroek, Rick Rubinacci, Simone Xenarios, Ioannis Thoma, Yann Delaneau, Olivier Bioinformatics Original Papers MOTIVATION: Generation of genotype data has been growing exponentially over the last decade. With the large size of recent datasets comes a storage and computational burden with ever increasing costs. To reduce this burden, we propose XSI, a file format with reduced storage footprint that also allows computation on the compressed data and we show how this can improve future analyses. RESULTS: We show that xSqueezeIt (XSI) allows for a file size reduction of [Formula: see text] compared with compressed BCF and demonstrate its potential for ‘compressive genomics’ on the UK Biobank whole-genome sequencing genotypes with [Formula: see text] faster loading times, [Formula: see text] faster run of homozygozity computation, [Formula: see text] faster dot products computation and [Formula: see text] faster allele counts. AVAILABILITY AND IMPLEMENTATION: The XSI file format specifications, API and command line tool are released under open-source (MIT) license and are available at https://github.com/rwk-unil/xSqueezeIt SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-06-24 /pmc/articles/PMC9344850/ /pubmed/35748697 http://dx.doi.org/10.1093/bioinformatics/btac413 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Wertenbroek, Rick
Rubinacci, Simone
Xenarios, Ioannis
Thoma, Yann
Delaneau, Olivier
XSI—a genotype compression tool for compressive genomics in large biobanks
title XSI—a genotype compression tool for compressive genomics in large biobanks
title_full XSI—a genotype compression tool for compressive genomics in large biobanks
title_fullStr XSI—a genotype compression tool for compressive genomics in large biobanks
title_full_unstemmed XSI—a genotype compression tool for compressive genomics in large biobanks
title_short XSI—a genotype compression tool for compressive genomics in large biobanks
title_sort xsi—a genotype compression tool for compressive genomics in large biobanks
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9344850/
https://www.ncbi.nlm.nih.gov/pubmed/35748697
http://dx.doi.org/10.1093/bioinformatics/btac413
work_keys_str_mv AT wertenbroekrick xsiagenotypecompressiontoolforcompressivegenomicsinlargebiobanks
AT rubinaccisimone xsiagenotypecompressiontoolforcompressivegenomicsinlargebiobanks
AT xenariosioannis xsiagenotypecompressiontoolforcompressivegenomicsinlargebiobanks
AT thomayann xsiagenotypecompressiontoolforcompressivegenomicsinlargebiobanks
AT delaneauolivier xsiagenotypecompressiontoolforcompressivegenomicsinlargebiobanks