Cargando…

SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants

SUMMARY: We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS associa...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kuksa, Pavel P, Lee, Chien-Yueh, Amlie-Wolf, Alexandre, Gangadharan, Prabhakaran, Mlynarski, Elizabeth E, Chou, Yi-Fan, Lin, Han-Jen, Issen, Heather, Greenfest-Allen, Emily, Valladares, Otto, Leung, Yuk Yee, Wang, Li-San
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Applications Notes
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7320617/ https://www.ncbi.nlm.nih.gov/pubmed/32330239 http://dx.doi.org/10.1093/bioinformatics/btaa246

_version_	1783551278969782272
author	Kuksa, Pavel P Lee, Chien-Yueh Amlie-Wolf, Alexandre Gangadharan, Prabhakaran Mlynarski, Elizabeth E Chou, Yi-Fan Lin, Han-Jen Issen, Heather Greenfest-Allen, Emily Valladares, Otto Leung, Yuk Yee Wang, Li-San
author_facet	Kuksa, Pavel P Lee, Chien-Yueh Amlie-Wolf, Alexandre Gangadharan, Prabhakaran Mlynarski, Elizabeth E Chou, Yi-Fan Lin, Han-Jen Issen, Heather Greenfest-Allen, Emily Valladares, Otto Leung, Yuk Yee Wang, Li-San
author_sort	Kuksa, Pavel P
collection	PubMed
description	SUMMARY: We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources. AVAILABILITY AND IMPLEMENTATION: SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno. CONTACT: lswang@pennmedicine.upenn.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online
format	Online Article Text
id	pubmed-7320617
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-73206172020-07-01 SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants Kuksa, Pavel P Lee, Chien-Yueh Amlie-Wolf, Alexandre Gangadharan, Prabhakaran Mlynarski, Elizabeth E Chou, Yi-Fan Lin, Han-Jen Issen, Heather Greenfest-Allen, Emily Valladares, Otto Leung, Yuk Yee Wang, Li-San Bioinformatics Applications Notes SUMMARY: We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources. AVAILABILITY AND IMPLEMENTATION: SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno. CONTACT: lswang@pennmedicine.upenn.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online Oxford University Press 2020-06-15 2020-04-24 /pmc/articles/PMC7320617/ /pubmed/32330239 http://dx.doi.org/10.1093/bioinformatics/btaa246 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Applications Notes Kuksa, Pavel P Lee, Chien-Yueh Amlie-Wolf, Alexandre Gangadharan, Prabhakaran Mlynarski, Elizabeth E Chou, Yi-Fan Lin, Han-Jen Issen, Heather Greenfest-Allen, Emily Valladares, Otto Leung, Yuk Yee Wang, Li-San SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants
title	SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants
title_full	SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants
title_fullStr	SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants
title_full_unstemmed	SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants
title_short	SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants
title_sort	sparkinferno: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants
topic	Applications Notes
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7320617/ https://www.ncbi.nlm.nih.gov/pubmed/32330239 http://dx.doi.org/10.1093/bioinformatics/btaa246
work_keys_str_mv	AT kuksapavelp sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT leechienyueh sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT amliewolfalexandre sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT gangadharanprabhakaran sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT mlynarskielizabethe sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT chouyifan sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT linhanjen sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT issenheather sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT greenfestallenemily sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT valladaresotto sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT leungyukyee sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants AT wanglisan sparkinfernoascalablehighthroughputpipelineforinferringmolecularmechanismsofnoncodinggeneticvariants

SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants

Ejemplares similares