Cargando…

Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes

The first decade of Genome Wide Association Studies (GWAS) has uncovered a wealth of disease-associated variants. Two important derivations will be the translation of this information into a multiscale understanding of pathogenic variants and leveraging existing data to increase the power of existin...

Descripción completa

Detalles Bibliográficos
Autores principales: Himmelstein, Daniel S., Baranzini, Sergio E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4497619/
https://www.ncbi.nlm.nih.gov/pubmed/26158728
http://dx.doi.org/10.1371/journal.pcbi.1004259
_version_ 1782380529771872256
author Himmelstein, Daniel S.
Baranzini, Sergio E.
author_facet Himmelstein, Daniel S.
Baranzini, Sergio E.
author_sort Himmelstein, Daniel S.
collection PubMed
description The first decade of Genome Wide Association Studies (GWAS) has uncovered a wealth of disease-associated variants. Two important derivations will be the translation of this information into a multiscale understanding of pathogenic variants and leveraging existing data to increase the power of existing and future studies through prioritization. We explore edge prediction on heterogeneous networks—graphs with multiple node and edge types—for accomplishing both tasks. First we constructed a network with 18 node types—genes, diseases, tissues, pathophysiologies, and 14 MSigDB (molecular signatures database) collections—and 19 edge types from high-throughput publicly-available resources. From this network composed of 40,343 nodes and 1,608,168 edges, we extracted features that describe the topology between specific genes and diseases. Next, we trained a model from GWAS associations and predicted the probability of association between each protein-coding gene and each of 29 well-studied complex diseases. The model, which achieved 132-fold enrichment in precision at 10% recall, outperformed any individual domain, highlighting the benefit of integrative approaches. We identified pleiotropy, transcriptional signatures of perturbations, pathways, and protein interactions as influential mechanisms explaining pathogenesis. Our method successfully predicted the results (with AUROC = 0.79) from a withheld multiple sclerosis (MS) GWAS despite starting with only 13 previously associated genes. Finally, we combined our network predictions with statistical evidence of association to propose four novel MS genes, three of which (JAK2, REL, RUNX3) validated on the masked GWAS. Furthermore, our predictions provide biological support highlighting REL as the causal gene within its gene-rich locus. Users can browse all predictions online (http://het.io). Heterogeneous network edge prediction effectively prioritized genetic associations and provides a powerful new approach for data integration across multiple domains.
format Online
Article
Text
id pubmed-4497619
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44976192015-07-14 Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes Himmelstein, Daniel S. Baranzini, Sergio E. PLoS Comput Biol Research Article The first decade of Genome Wide Association Studies (GWAS) has uncovered a wealth of disease-associated variants. Two important derivations will be the translation of this information into a multiscale understanding of pathogenic variants and leveraging existing data to increase the power of existing and future studies through prioritization. We explore edge prediction on heterogeneous networks—graphs with multiple node and edge types—for accomplishing both tasks. First we constructed a network with 18 node types—genes, diseases, tissues, pathophysiologies, and 14 MSigDB (molecular signatures database) collections—and 19 edge types from high-throughput publicly-available resources. From this network composed of 40,343 nodes and 1,608,168 edges, we extracted features that describe the topology between specific genes and diseases. Next, we trained a model from GWAS associations and predicted the probability of association between each protein-coding gene and each of 29 well-studied complex diseases. The model, which achieved 132-fold enrichment in precision at 10% recall, outperformed any individual domain, highlighting the benefit of integrative approaches. We identified pleiotropy, transcriptional signatures of perturbations, pathways, and protein interactions as influential mechanisms explaining pathogenesis. Our method successfully predicted the results (with AUROC = 0.79) from a withheld multiple sclerosis (MS) GWAS despite starting with only 13 previously associated genes. Finally, we combined our network predictions with statistical evidence of association to propose four novel MS genes, three of which (JAK2, REL, RUNX3) validated on the masked GWAS. Furthermore, our predictions provide biological support highlighting REL as the causal gene within its gene-rich locus. Users can browse all predictions online (http://het.io). Heterogeneous network edge prediction effectively prioritized genetic associations and provides a powerful new approach for data integration across multiple domains. Public Library of Science 2015-07-09 /pmc/articles/PMC4497619/ /pubmed/26158728 http://dx.doi.org/10.1371/journal.pcbi.1004259 Text en © 2015 Himmelstein, Baranzini http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Himmelstein, Daniel S.
Baranzini, Sergio E.
Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes
title Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes
title_full Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes
title_fullStr Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes
title_full_unstemmed Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes
title_short Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes
title_sort heterogeneous network edge prediction: a data integration approach to prioritize disease-associated genes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4497619/
https://www.ncbi.nlm.nih.gov/pubmed/26158728
http://dx.doi.org/10.1371/journal.pcbi.1004259
work_keys_str_mv AT himmelsteindaniels heterogeneousnetworkedgepredictionadataintegrationapproachtoprioritizediseaseassociatedgenes
AT baranzinisergioe heterogeneousnetworkedgepredictionadataintegrationapproachtoprioritizediseaseassociatedgenes