Cargando…
Ultrafast and scalable variant annotation and prioritization with big functional genomics data
The advances of large-scale genomics studies have enabled compilation of cell type–specific, genome-wide DNA functional elements at high resolution. With the growing volume of functional annotation data and sequencing variants, existing variant annotation algorithms lack the efficiency and scalabili...
Autores principales: | , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7706736/ https://www.ncbi.nlm.nih.gov/pubmed/33060171 http://dx.doi.org/10.1101/gr.267997.120 |
_version_ | 1783617211850555392 |
---|---|
author | Huang, Dandan Yi, Xianfu Zhou, Yao Yao, Hongcheng Xu, Hang Wang, Jianhua Zhang, Shijie Nong, Wenyan Wang, Panwen Shi, Lei Xuan, Chenghao Li, Miaoxin Wang, Junwen Li, Weidong Kwan, Hoi Shan Sham, Pak Chung Wang, Kai Li, Mulin Jun |
author_facet | Huang, Dandan Yi, Xianfu Zhou, Yao Yao, Hongcheng Xu, Hang Wang, Jianhua Zhang, Shijie Nong, Wenyan Wang, Panwen Shi, Lei Xuan, Chenghao Li, Miaoxin Wang, Junwen Li, Weidong Kwan, Hoi Shan Sham, Pak Chung Wang, Kai Li, Mulin Jun |
author_sort | Huang, Dandan |
collection | PubMed |
description | The advances of large-scale genomics studies have enabled compilation of cell type–specific, genome-wide DNA functional elements at high resolution. With the growing volume of functional annotation data and sequencing variants, existing variant annotation algorithms lack the efficiency and scalability to process big genomic data, particularly when annotating whole-genome sequencing variants against a huge database with billions of genomic features. Here, we develop VarNote to rapidly annotate genome-scale variants in large and complex functional annotation resources. Equipped with a novel index system and a parallel random-sweep searching algorithm, VarNote shows substantial performance improvements (two to three orders of magnitude) over existing algorithms at different scales. It supports both region-based and allele-specific annotations and introduces advanced functions for the flexible extraction of annotations. By integrating massive base-wise and context-dependent annotations in the VarNote framework, we introduce three efficient and accurate pipelines to prioritize the causal regulatory variants for common diseases, Mendelian disorders, and cancers. |
format | Online Article Text |
id | pubmed-7706736 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Cold Spring Harbor Laboratory Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-77067362021-06-01 Ultrafast and scalable variant annotation and prioritization with big functional genomics data Huang, Dandan Yi, Xianfu Zhou, Yao Yao, Hongcheng Xu, Hang Wang, Jianhua Zhang, Shijie Nong, Wenyan Wang, Panwen Shi, Lei Xuan, Chenghao Li, Miaoxin Wang, Junwen Li, Weidong Kwan, Hoi Shan Sham, Pak Chung Wang, Kai Li, Mulin Jun Genome Res Method The advances of large-scale genomics studies have enabled compilation of cell type–specific, genome-wide DNA functional elements at high resolution. With the growing volume of functional annotation data and sequencing variants, existing variant annotation algorithms lack the efficiency and scalability to process big genomic data, particularly when annotating whole-genome sequencing variants against a huge database with billions of genomic features. Here, we develop VarNote to rapidly annotate genome-scale variants in large and complex functional annotation resources. Equipped with a novel index system and a parallel random-sweep searching algorithm, VarNote shows substantial performance improvements (two to three orders of magnitude) over existing algorithms at different scales. It supports both region-based and allele-specific annotations and introduces advanced functions for the flexible extraction of annotations. By integrating massive base-wise and context-dependent annotations in the VarNote framework, we introduce three efficient and accurate pipelines to prioritize the causal regulatory variants for common diseases, Mendelian disorders, and cancers. Cold Spring Harbor Laboratory Press 2020-12 /pmc/articles/PMC7706736/ /pubmed/33060171 http://dx.doi.org/10.1101/gr.267997.120 Text en © 2020 Huang et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/. |
spellingShingle | Method Huang, Dandan Yi, Xianfu Zhou, Yao Yao, Hongcheng Xu, Hang Wang, Jianhua Zhang, Shijie Nong, Wenyan Wang, Panwen Shi, Lei Xuan, Chenghao Li, Miaoxin Wang, Junwen Li, Weidong Kwan, Hoi Shan Sham, Pak Chung Wang, Kai Li, Mulin Jun Ultrafast and scalable variant annotation and prioritization with big functional genomics data |
title | Ultrafast and scalable variant annotation and prioritization with big functional genomics data |
title_full | Ultrafast and scalable variant annotation and prioritization with big functional genomics data |
title_fullStr | Ultrafast and scalable variant annotation and prioritization with big functional genomics data |
title_full_unstemmed | Ultrafast and scalable variant annotation and prioritization with big functional genomics data |
title_short | Ultrafast and scalable variant annotation and prioritization with big functional genomics data |
title_sort | ultrafast and scalable variant annotation and prioritization with big functional genomics data |
topic | Method |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7706736/ https://www.ncbi.nlm.nih.gov/pubmed/33060171 http://dx.doi.org/10.1101/gr.267997.120 |
work_keys_str_mv | AT huangdandan ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata AT yixianfu ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata AT zhouyao ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata AT yaohongcheng ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata AT xuhang ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata AT wangjianhua ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata AT zhangshijie ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata AT nongwenyan ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata AT wangpanwen ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata AT shilei ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata AT xuanchenghao ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata AT limiaoxin ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata AT wangjunwen ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata AT liweidong ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata AT kwanhoishan ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata AT shampakchung ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata AT wangkai ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata AT limulinjun ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata |