Cargando…
Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels
Germline and somatic variants within an individual or cohort are interpreted with information from large cohorts. Annotation with this information becomes a computational bottleneck as population sets grow to terabytes of data. Here, we introduce echtvar, which efficiently encodes population variant...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9841399/ https://www.ncbi.nlm.nih.gov/pubmed/36300617 http://dx.doi.org/10.1093/nar/gkac931 |
_version_ | 1784869828607082496 |
---|---|
author | Pedersen, Brent S de Ridder, Jeroen |
author_facet | Pedersen, Brent S de Ridder, Jeroen |
author_sort | Pedersen, Brent S |
collection | PubMed |
description | Germline and somatic variants within an individual or cohort are interpreted with information from large cohorts. Annotation with this information becomes a computational bottleneck as population sets grow to terabytes of data. Here, we introduce echtvar, which efficiently encodes population variants and annotation fields into a compressed archive that can be used for rapid variant annotation and filtering. Most variants, represented by chromosome, position and alleles are encoded into 32-bits-half the size of previous encoding schemes and at least 4 times smaller than a naive encoding. The annotations, stored separately within the same archive, are also encoded and compressed. We show that echtvar is faster and uses less space than existing tools and that it can effectively reduce the number of candidate variants. We give examples on germ-line and somatic variants to document how echtvar can facilitate exploratory data analysis on genetic variants. Echtvar is available at https://github.com/brentp/echtvar under an MIT license. |
format | Online Article Text |
id | pubmed-9841399 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-98413992023-01-18 Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels Pedersen, Brent S de Ridder, Jeroen Nucleic Acids Res Methods Online Germline and somatic variants within an individual or cohort are interpreted with information from large cohorts. Annotation with this information becomes a computational bottleneck as population sets grow to terabytes of data. Here, we introduce echtvar, which efficiently encodes population variants and annotation fields into a compressed archive that can be used for rapid variant annotation and filtering. Most variants, represented by chromosome, position and alleles are encoded into 32-bits-half the size of previous encoding schemes and at least 4 times smaller than a naive encoding. The annotations, stored separately within the same archive, are also encoded and compressed. We show that echtvar is faster and uses less space than existing tools and that it can effectively reduce the number of candidate variants. We give examples on germ-line and somatic variants to document how echtvar can facilitate exploratory data analysis on genetic variants. Echtvar is available at https://github.com/brentp/echtvar under an MIT license. Oxford University Press 2022-10-27 /pmc/articles/PMC9841399/ /pubmed/36300617 http://dx.doi.org/10.1093/nar/gkac931 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Pedersen, Brent S de Ridder, Jeroen Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels |
title | Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels |
title_full | Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels |
title_fullStr | Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels |
title_full_unstemmed | Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels |
title_short | Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels |
title_sort | echtvar: compressed variant representation for rapid annotation and filtering of snps and indels |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9841399/ https://www.ncbi.nlm.nih.gov/pubmed/36300617 http://dx.doi.org/10.1093/nar/gkac931 |
work_keys_str_mv | AT pedersenbrents echtvarcompressedvariantrepresentationforrapidannotationandfilteringofsnpsandindels AT deridderjeroen echtvarcompressedvariantrepresentationforrapidannotationandfilteringofsnpsandindels |