Cargando…

The probability of edge existence due to node degree: a baseline for network-based predictions

Important tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, an...

Descripción completa

Detalles Bibliográficos
Autores principales: Zietz, Michael, Himmelstein, Daniel S., Kloster, Kyle, Williams, Christopher, Nagle, Michael W., Greene, Casey S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9881952/
https://www.ncbi.nlm.nih.gov/pubmed/36711569
http://dx.doi.org/10.1101/2023.01.05.522939
_version_ 1784879213019398144
author Zietz, Michael
Himmelstein, Daniel S.
Kloster, Kyle
Williams, Christopher
Nagle, Michael W.
Greene, Casey S.
author_facet Zietz, Michael
Himmelstein, Daniel S.
Kloster, Kyle
Williams, Christopher
Nagle, Michael W.
Greene, Casey S.
author_sort Zietz, Michael
collection PubMed
description Important tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Our framework decomposes performance into the proportions attributable to degree and the network’s specific connections. We discover that performance attributable to factors other than degree is often only a small portion of overall performance. Degree’s predictive performance diminishes when the networks used for training and testing—despite measuring the same biological relationships—were generated using distinct techniques and hence have large differences in degree distribution. We introduce the permutation-derived edge prior as the probability that an edge exists based only on degree. The edge prior shows excellent discrimination and calibration for 20 biomedical networks (16 bipartite, 3 undirected, 1 directed), with AUROCs frequently exceeding 0.85. Researchers seeking to predict new or missing edges in biological networks should use the edge prior as a baseline to identify the fraction of performance that is nonspecific because of degree. We released our methods as an open-source Python package (https://github.com/hetio/xswap/).
format Online
Article
Text
id pubmed-9881952
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-98819522023-01-28 The probability of edge existence due to node degree: a baseline for network-based predictions Zietz, Michael Himmelstein, Daniel S. Kloster, Kyle Williams, Christopher Nagle, Michael W. Greene, Casey S. bioRxiv Article Important tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Our framework decomposes performance into the proportions attributable to degree and the network’s specific connections. We discover that performance attributable to factors other than degree is often only a small portion of overall performance. Degree’s predictive performance diminishes when the networks used for training and testing—despite measuring the same biological relationships—were generated using distinct techniques and hence have large differences in degree distribution. We introduce the permutation-derived edge prior as the probability that an edge exists based only on degree. The edge prior shows excellent discrimination and calibration for 20 biomedical networks (16 bipartite, 3 undirected, 1 directed), with AUROCs frequently exceeding 0.85. Researchers seeking to predict new or missing edges in biological networks should use the edge prior as a baseline to identify the fraction of performance that is nonspecific because of degree. We released our methods as an open-source Python package (https://github.com/hetio/xswap/). Cold Spring Harbor Laboratory 2023-01-06 /pmc/articles/PMC9881952/ /pubmed/36711569 http://dx.doi.org/10.1101/2023.01.05.522939 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Zietz, Michael
Himmelstein, Daniel S.
Kloster, Kyle
Williams, Christopher
Nagle, Michael W.
Greene, Casey S.
The probability of edge existence due to node degree: a baseline for network-based predictions
title The probability of edge existence due to node degree: a baseline for network-based predictions
title_full The probability of edge existence due to node degree: a baseline for network-based predictions
title_fullStr The probability of edge existence due to node degree: a baseline for network-based predictions
title_full_unstemmed The probability of edge existence due to node degree: a baseline for network-based predictions
title_short The probability of edge existence due to node degree: a baseline for network-based predictions
title_sort probability of edge existence due to node degree: a baseline for network-based predictions
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9881952/
https://www.ncbi.nlm.nih.gov/pubmed/36711569
http://dx.doi.org/10.1101/2023.01.05.522939
work_keys_str_mv AT zietzmichael theprobabilityofedgeexistenceduetonodedegreeabaselinefornetworkbasedpredictions
AT himmelsteindaniels theprobabilityofedgeexistenceduetonodedegreeabaselinefornetworkbasedpredictions
AT klosterkyle theprobabilityofedgeexistenceduetonodedegreeabaselinefornetworkbasedpredictions
AT williamschristopher theprobabilityofedgeexistenceduetonodedegreeabaselinefornetworkbasedpredictions
AT naglemichaelw theprobabilityofedgeexistenceduetonodedegreeabaselinefornetworkbasedpredictions
AT greenecaseys theprobabilityofedgeexistenceduetonodedegreeabaselinefornetworkbasedpredictions
AT zietzmichael probabilityofedgeexistenceduetonodedegreeabaselinefornetworkbasedpredictions
AT himmelsteindaniels probabilityofedgeexistenceduetonodedegreeabaselinefornetworkbasedpredictions
AT klosterkyle probabilityofedgeexistenceduetonodedegreeabaselinefornetworkbasedpredictions
AT williamschristopher probabilityofedgeexistenceduetonodedegreeabaselinefornetworkbasedpredictions
AT naglemichaelw probabilityofedgeexistenceduetonodedegreeabaselinefornetworkbasedpredictions
AT greenecaseys probabilityofedgeexistenceduetonodedegreeabaselinefornetworkbasedpredictions