Cargando…
XSI—a genotype compression tool for compressive genomics in large biobanks
MOTIVATION: Generation of genotype data has been growing exponentially over the last decade. With the large size of recent datasets comes a storage and computational burden with ever increasing costs. To reduce this burden, we propose XSI, a file format with reduced storage footprint that also allow...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9344850/ https://www.ncbi.nlm.nih.gov/pubmed/35748697 http://dx.doi.org/10.1093/bioinformatics/btac413 |
_version_ | 1784761305432850432 |
---|---|
author | Wertenbroek, Rick Rubinacci, Simone Xenarios, Ioannis Thoma, Yann Delaneau, Olivier |
author_facet | Wertenbroek, Rick Rubinacci, Simone Xenarios, Ioannis Thoma, Yann Delaneau, Olivier |
author_sort | Wertenbroek, Rick |
collection | PubMed |
description | MOTIVATION: Generation of genotype data has been growing exponentially over the last decade. With the large size of recent datasets comes a storage and computational burden with ever increasing costs. To reduce this burden, we propose XSI, a file format with reduced storage footprint that also allows computation on the compressed data and we show how this can improve future analyses. RESULTS: We show that xSqueezeIt (XSI) allows for a file size reduction of [Formula: see text] compared with compressed BCF and demonstrate its potential for ‘compressive genomics’ on the UK Biobank whole-genome sequencing genotypes with [Formula: see text] faster loading times, [Formula: see text] faster run of homozygozity computation, [Formula: see text] faster dot products computation and [Formula: see text] faster allele counts. AVAILABILITY AND IMPLEMENTATION: The XSI file format specifications, API and command line tool are released under open-source (MIT) license and are available at https://github.com/rwk-unil/xSqueezeIt SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-9344850 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-93448502022-08-03 XSI—a genotype compression tool for compressive genomics in large biobanks Wertenbroek, Rick Rubinacci, Simone Xenarios, Ioannis Thoma, Yann Delaneau, Olivier Bioinformatics Original Papers MOTIVATION: Generation of genotype data has been growing exponentially over the last decade. With the large size of recent datasets comes a storage and computational burden with ever increasing costs. To reduce this burden, we propose XSI, a file format with reduced storage footprint that also allows computation on the compressed data and we show how this can improve future analyses. RESULTS: We show that xSqueezeIt (XSI) allows for a file size reduction of [Formula: see text] compared with compressed BCF and demonstrate its potential for ‘compressive genomics’ on the UK Biobank whole-genome sequencing genotypes with [Formula: see text] faster loading times, [Formula: see text] faster run of homozygozity computation, [Formula: see text] faster dot products computation and [Formula: see text] faster allele counts. AVAILABILITY AND IMPLEMENTATION: The XSI file format specifications, API and command line tool are released under open-source (MIT) license and are available at https://github.com/rwk-unil/xSqueezeIt SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-06-24 /pmc/articles/PMC9344850/ /pubmed/35748697 http://dx.doi.org/10.1093/bioinformatics/btac413 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Wertenbroek, Rick Rubinacci, Simone Xenarios, Ioannis Thoma, Yann Delaneau, Olivier XSI—a genotype compression tool for compressive genomics in large biobanks |
title | XSI—a genotype compression tool for compressive genomics in large biobanks |
title_full | XSI—a genotype compression tool for compressive genomics in large biobanks |
title_fullStr | XSI—a genotype compression tool for compressive genomics in large biobanks |
title_full_unstemmed | XSI—a genotype compression tool for compressive genomics in large biobanks |
title_short | XSI—a genotype compression tool for compressive genomics in large biobanks |
title_sort | xsi—a genotype compression tool for compressive genomics in large biobanks |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9344850/ https://www.ncbi.nlm.nih.gov/pubmed/35748697 http://dx.doi.org/10.1093/bioinformatics/btac413 |
work_keys_str_mv | AT wertenbroekrick xsiagenotypecompressiontoolforcompressivegenomicsinlargebiobanks AT rubinaccisimone xsiagenotypecompressiontoolforcompressivegenomicsinlargebiobanks AT xenariosioannis xsiagenotypecompressiontoolforcompressivegenomicsinlargebiobanks AT thomayann xsiagenotypecompressiontoolforcompressivegenomicsinlargebiobanks AT delaneauolivier xsiagenotypecompressiontoolforcompressivegenomicsinlargebiobanks |