Cargando…

Ultrafast and scalable variant annotation and prioritization with big functional genomics data

The advances of large-scale genomics studies have enabled compilation of cell type–specific, genome-wide DNA functional elements at high resolution. With the growing volume of functional annotation data and sequencing variants, existing variant annotation algorithms lack the efficiency and scalabili...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Dandan, Yi, Xianfu, Zhou, Yao, Yao, Hongcheng, Xu, Hang, Wang, Jianhua, Zhang, Shijie, Nong, Wenyan, Wang, Panwen, Shi, Lei, Xuan, Chenghao, Li, Miaoxin, Wang, Junwen, Li, Weidong, Kwan, Hoi Shan, Sham, Pak Chung, Wang, Kai, Li, Mulin Jun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7706736/
https://www.ncbi.nlm.nih.gov/pubmed/33060171
http://dx.doi.org/10.1101/gr.267997.120
_version_ 1783617211850555392
author Huang, Dandan
Yi, Xianfu
Zhou, Yao
Yao, Hongcheng
Xu, Hang
Wang, Jianhua
Zhang, Shijie
Nong, Wenyan
Wang, Panwen
Shi, Lei
Xuan, Chenghao
Li, Miaoxin
Wang, Junwen
Li, Weidong
Kwan, Hoi Shan
Sham, Pak Chung
Wang, Kai
Li, Mulin Jun
author_facet Huang, Dandan
Yi, Xianfu
Zhou, Yao
Yao, Hongcheng
Xu, Hang
Wang, Jianhua
Zhang, Shijie
Nong, Wenyan
Wang, Panwen
Shi, Lei
Xuan, Chenghao
Li, Miaoxin
Wang, Junwen
Li, Weidong
Kwan, Hoi Shan
Sham, Pak Chung
Wang, Kai
Li, Mulin Jun
author_sort Huang, Dandan
collection PubMed
description The advances of large-scale genomics studies have enabled compilation of cell type–specific, genome-wide DNA functional elements at high resolution. With the growing volume of functional annotation data and sequencing variants, existing variant annotation algorithms lack the efficiency and scalability to process big genomic data, particularly when annotating whole-genome sequencing variants against a huge database with billions of genomic features. Here, we develop VarNote to rapidly annotate genome-scale variants in large and complex functional annotation resources. Equipped with a novel index system and a parallel random-sweep searching algorithm, VarNote shows substantial performance improvements (two to three orders of magnitude) over existing algorithms at different scales. It supports both region-based and allele-specific annotations and introduces advanced functions for the flexible extraction of annotations. By integrating massive base-wise and context-dependent annotations in the VarNote framework, we introduce three efficient and accurate pipelines to prioritize the causal regulatory variants for common diseases, Mendelian disorders, and cancers.
format Online
Article
Text
id pubmed-7706736
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-77067362021-06-01 Ultrafast and scalable variant annotation and prioritization with big functional genomics data Huang, Dandan Yi, Xianfu Zhou, Yao Yao, Hongcheng Xu, Hang Wang, Jianhua Zhang, Shijie Nong, Wenyan Wang, Panwen Shi, Lei Xuan, Chenghao Li, Miaoxin Wang, Junwen Li, Weidong Kwan, Hoi Shan Sham, Pak Chung Wang, Kai Li, Mulin Jun Genome Res Method The advances of large-scale genomics studies have enabled compilation of cell type–specific, genome-wide DNA functional elements at high resolution. With the growing volume of functional annotation data and sequencing variants, existing variant annotation algorithms lack the efficiency and scalability to process big genomic data, particularly when annotating whole-genome sequencing variants against a huge database with billions of genomic features. Here, we develop VarNote to rapidly annotate genome-scale variants in large and complex functional annotation resources. Equipped with a novel index system and a parallel random-sweep searching algorithm, VarNote shows substantial performance improvements (two to three orders of magnitude) over existing algorithms at different scales. It supports both region-based and allele-specific annotations and introduces advanced functions for the flexible extraction of annotations. By integrating massive base-wise and context-dependent annotations in the VarNote framework, we introduce three efficient and accurate pipelines to prioritize the causal regulatory variants for common diseases, Mendelian disorders, and cancers. Cold Spring Harbor Laboratory Press 2020-12 /pmc/articles/PMC7706736/ /pubmed/33060171 http://dx.doi.org/10.1101/gr.267997.120 Text en © 2020 Huang et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Method
Huang, Dandan
Yi, Xianfu
Zhou, Yao
Yao, Hongcheng
Xu, Hang
Wang, Jianhua
Zhang, Shijie
Nong, Wenyan
Wang, Panwen
Shi, Lei
Xuan, Chenghao
Li, Miaoxin
Wang, Junwen
Li, Weidong
Kwan, Hoi Shan
Sham, Pak Chung
Wang, Kai
Li, Mulin Jun
Ultrafast and scalable variant annotation and prioritization with big functional genomics data
title Ultrafast and scalable variant annotation and prioritization with big functional genomics data
title_full Ultrafast and scalable variant annotation and prioritization with big functional genomics data
title_fullStr Ultrafast and scalable variant annotation and prioritization with big functional genomics data
title_full_unstemmed Ultrafast and scalable variant annotation and prioritization with big functional genomics data
title_short Ultrafast and scalable variant annotation and prioritization with big functional genomics data
title_sort ultrafast and scalable variant annotation and prioritization with big functional genomics data
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7706736/
https://www.ncbi.nlm.nih.gov/pubmed/33060171
http://dx.doi.org/10.1101/gr.267997.120
work_keys_str_mv AT huangdandan ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata
AT yixianfu ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata
AT zhouyao ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata
AT yaohongcheng ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata
AT xuhang ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata
AT wangjianhua ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata
AT zhangshijie ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata
AT nongwenyan ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata
AT wangpanwen ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata
AT shilei ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata
AT xuanchenghao ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata
AT limiaoxin ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata
AT wangjunwen ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata
AT liweidong ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata
AT kwanhoishan ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata
AT shampakchung ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata
AT wangkai ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata
AT limulinjun ultrafastandscalablevariantannotationandprioritizationwithbigfunctionalgenomicsdata