Cargando…
SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants
SUMMARY: We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS associa...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7320617/ https://www.ncbi.nlm.nih.gov/pubmed/32330239 http://dx.doi.org/10.1093/bioinformatics/btaa246 |
_version_ | 1783551278969782272 |
---|---|
author | Kuksa, Pavel P Lee, Chien-Yueh Amlie-Wolf, Alexandre Gangadharan, Prabhakaran Mlynarski, Elizabeth E Chou, Yi-Fan Lin, Han-Jen Issen, Heather Greenfest-Allen, Emily Valladares, Otto Leung, Yuk Yee Wang, Li-San |
author_facet | Kuksa, Pavel P Lee, Chien-Yueh Amlie-Wolf, Alexandre Gangadharan, Prabhakaran Mlynarski, Elizabeth E Chou, Yi-Fan Lin, Han-Jen Issen, Heather Greenfest-Allen, Emily Valladares, Otto Leung, Yuk Yee Wang, Li-San |
author_sort | Kuksa, Pavel P |
collection | PubMed |
description | SUMMARY: We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources. AVAILABILITY AND IMPLEMENTATION: SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno. CONTACT: lswang@pennmedicine.upenn.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online |
format | Online Article Text |
id | pubmed-7320617 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-73206172020-07-01 SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants Kuksa, Pavel P Lee, Chien-Yueh Amlie-Wolf, Alexandre Gangadharan, Prabhakaran Mlynarski, Elizabeth E Chou, Yi-Fan Lin, Han-Jen Issen, Heather Greenfest-Allen, Emily Valladares, Otto Leung, Yuk Yee Wang, Li-San Bioinformatics Applications Notes SUMMARY: We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources. AVAILABILITY AND IMPLEMENTATION: SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno. CONTACT: lswang@pennmedicine.upenn.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online Oxford University Press 2020-06-15 2020-04-24 /pmc/articles/PMC7320617/ /pubmed/32330239 http://dx.doi.org/10.1093/bioinformatics/btaa246 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Applications Notes Kuksa, Pavel P Lee, Chien-Yueh Amlie-Wolf, Alexandre Gangadharan, Prabhakaran Mlynarski, Elizabeth E Chou, Yi-Fan Lin, Han-Jen Issen, Heather Greenfest-Allen, Emily Valladares, Otto Leung, Yuk Yee Wang, Li-San SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants |
title | SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants |
title_full | SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants |
title_fullStr | SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants |
title_full_unstemmed | SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants |
title_short | SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants |
title_sort | sparkinferno: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants |
topic | Applications Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7320617/ https://www.ncbi.nlm.nih.gov/pubmed/32330239 http://dx.doi.org/10.1093/bioinformatics/btaa246 |
work_keys_str_mv | AT kuksapavelp sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT leechienyueh sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT amliewolfalexandre sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT gangadharanprabhakaran sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT mlynarskielizabethe sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT chouyifan sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT linhanjen sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT issenheather sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT greenfestallenemily sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT valladaresotto sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT leungyukyee sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT wanglisan sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants |