Cargando…

SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants

SUMMARY: We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS associa...

Descripción completa

Detalles Bibliográficos
Autores principales: Kuksa, Pavel P, Lee, Chien-Yueh, Amlie-Wolf, Alexandre, Gangadharan, Prabhakaran, Mlynarski, Elizabeth E, Chou, Yi-Fan, Lin, Han-Jen, Issen, Heather, Greenfest-Allen, Emily, Valladares, Otto, Leung, Yuk Yee, Wang, Li-San
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7320617/
https://www.ncbi.nlm.nih.gov/pubmed/32330239
http://dx.doi.org/10.1093/bioinformatics/btaa246
_version_ 1783551278969782272
author Kuksa, Pavel P
Lee, Chien-Yueh
Amlie-Wolf, Alexandre
Gangadharan, Prabhakaran
Mlynarski, Elizabeth E
Chou, Yi-Fan
Lin, Han-Jen
Issen, Heather
Greenfest-Allen, Emily
Valladares, Otto
Leung, Yuk Yee
Wang, Li-San
author_facet Kuksa, Pavel P
Lee, Chien-Yueh
Amlie-Wolf, Alexandre
Gangadharan, Prabhakaran
Mlynarski, Elizabeth E
Chou, Yi-Fan
Lin, Han-Jen
Issen, Heather
Greenfest-Allen, Emily
Valladares, Otto
Leung, Yuk Yee
Wang, Li-San
author_sort Kuksa, Pavel P
collection PubMed
description SUMMARY: We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources. AVAILABILITY AND IMPLEMENTATION: SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno. CONTACT: lswang@pennmedicine.upenn.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online
format Online
Article
Text
id pubmed-7320617
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-73206172020-07-01 SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants Kuksa, Pavel P Lee, Chien-Yueh Amlie-Wolf, Alexandre Gangadharan, Prabhakaran Mlynarski, Elizabeth E Chou, Yi-Fan Lin, Han-Jen Issen, Heather Greenfest-Allen, Emily Valladares, Otto Leung, Yuk Yee Wang, Li-San Bioinformatics Applications Notes SUMMARY: We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources. AVAILABILITY AND IMPLEMENTATION: SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno. CONTACT: lswang@pennmedicine.upenn.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online Oxford University Press 2020-06-15 2020-04-24 /pmc/articles/PMC7320617/ /pubmed/32330239 http://dx.doi.org/10.1093/bioinformatics/btaa246 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Applications Notes
Kuksa, Pavel P
Lee, Chien-Yueh
Amlie-Wolf, Alexandre
Gangadharan, Prabhakaran
Mlynarski, Elizabeth E
Chou, Yi-Fan
Lin, Han-Jen
Issen, Heather
Greenfest-Allen, Emily
Valladares, Otto
Leung, Yuk Yee
Wang, Li-San
SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants
title SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants
title_full SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants
title_fullStr SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants
title_full_unstemmed SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants
title_short SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants
title_sort sparkinferno: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7320617/
https://www.ncbi.nlm.nih.gov/pubmed/32330239
http://dx.doi.org/10.1093/bioinformatics/btaa246
work_keys_str_mv AT kuksapavelp sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants
AT leechienyueh sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants
AT amliewolfalexandre sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants
AT gangadharanprabhakaran sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants
AT mlynarskielizabethe sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants
AT chouyifan sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants
AT linhanjen sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants
AT issenheather sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants
AT greenfestallenemily sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants
AT valladaresotto sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants
AT leungyukyee sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants
AT wanglisan sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants