Cargando…

Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels

Germline and somatic variants within an individual or cohort are interpreted with information from large cohorts. Annotation with this information becomes a computational bottleneck as population sets grow to terabytes of data. Here, we introduce echtvar, which efficiently encodes population variant...

Descripción completa

Detalles Bibliográficos
Autores principales: Pedersen, Brent S, de Ridder, Jeroen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9841399/
https://www.ncbi.nlm.nih.gov/pubmed/36300617
http://dx.doi.org/10.1093/nar/gkac931
_version_ 1784869828607082496
author Pedersen, Brent S
de Ridder, Jeroen
author_facet Pedersen, Brent S
de Ridder, Jeroen
author_sort Pedersen, Brent S
collection PubMed
description Germline and somatic variants within an individual or cohort are interpreted with information from large cohorts. Annotation with this information becomes a computational bottleneck as population sets grow to terabytes of data. Here, we introduce echtvar, which efficiently encodes population variants and annotation fields into a compressed archive that can be used for rapid variant annotation and filtering. Most variants, represented by chromosome, position and alleles are encoded into 32-bits-half the size of previous encoding schemes and at least 4 times smaller than a naive encoding. The annotations, stored separately within the same archive, are also encoded and compressed. We show that echtvar is faster and uses less space than existing tools and that it can effectively reduce the number of candidate variants. We give examples on germ-line and somatic variants to document how echtvar can facilitate exploratory data analysis on genetic variants. Echtvar is available at https://github.com/brentp/echtvar under an MIT license.
format Online
Article
Text
id pubmed-9841399
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98413992023-01-18 Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels Pedersen, Brent S de Ridder, Jeroen Nucleic Acids Res Methods Online Germline and somatic variants within an individual or cohort are interpreted with information from large cohorts. Annotation with this information becomes a computational bottleneck as population sets grow to terabytes of data. Here, we introduce echtvar, which efficiently encodes population variants and annotation fields into a compressed archive that can be used for rapid variant annotation and filtering. Most variants, represented by chromosome, position and alleles are encoded into 32-bits-half the size of previous encoding schemes and at least 4 times smaller than a naive encoding. The annotations, stored separately within the same archive, are also encoded and compressed. We show that echtvar is faster and uses less space than existing tools and that it can effectively reduce the number of candidate variants. We give examples on germ-line and somatic variants to document how echtvar can facilitate exploratory data analysis on genetic variants. Echtvar is available at https://github.com/brentp/echtvar under an MIT license. Oxford University Press 2022-10-27 /pmc/articles/PMC9841399/ /pubmed/36300617 http://dx.doi.org/10.1093/nar/gkac931 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Pedersen, Brent S
de Ridder, Jeroen
Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels
title Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels
title_full Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels
title_fullStr Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels
title_full_unstemmed Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels
title_short Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels
title_sort echtvar: compressed variant representation for rapid annotation and filtering of snps and indels
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9841399/
https://www.ncbi.nlm.nih.gov/pubmed/36300617
http://dx.doi.org/10.1093/nar/gkac931
work_keys_str_mv AT pedersenbrents echtvarcompressedvariantrepresentationforrapidannotationandfilteringofsnpsandindels
AT deridderjeroen echtvarcompressedvariantrepresentationforrapidannotationandfilteringofsnpsandindels