Cargando…
Highly scalable maximum likelihood and conjugate Bayesian inference for ERGMs on graph sets with equivalent vertices
The exponential family random graph modeling (ERGM) framework provides a highly flexible approach for the statistical analysis of networks (i.e., graphs). As ERGMs with dyadic dependence involve normalizing factors that are extremely costly to compute, practical strategies for ERGMs inference genera...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9417041/ https://www.ncbi.nlm.nih.gov/pubmed/36018834 http://dx.doi.org/10.1371/journal.pone.0273039 |
_version_ | 1784776612518035456 |
---|---|
author | Yin, Fan Butts, Carter T. |
author_facet | Yin, Fan Butts, Carter T. |
author_sort | Yin, Fan |
collection | PubMed |
description | The exponential family random graph modeling (ERGM) framework provides a highly flexible approach for the statistical analysis of networks (i.e., graphs). As ERGMs with dyadic dependence involve normalizing factors that are extremely costly to compute, practical strategies for ERGMs inference generally employ a variety of approximations or other workarounds. Markov Chain Monte Carlo maximum likelihood (MCMC MLE) provides a powerful tool to approximate the maximum likelihood estimator (MLE) of ERGM parameters, and is generally feasible for typical models on single networks with as many as a few thousand nodes. MCMC-based algorithms for Bayesian analysis are more expensive, and high-quality answers are challenging to obtain on large graphs. For both strategies, extension to the pooled case—in which we observe multiple networks from a common generative process—adds further computational cost, with both time and memory scaling linearly in the number of graphs. This becomes prohibitive for large networks, or cases in which large numbers of graph observations are available. Here, we exploit some basic properties of the discrete exponential families to develop an approach for ERGM inference in the pooled case that (where applicable) allows an arbitrarily large number of graph observations to be fit at no additional computational cost beyond preprocessing the data itself. Moreover, a variant of our approach can also be used to perform Bayesian inference under conjugate priors, again with no additional computational cost in the estimation phase. The latter can be employed either for single graph observations, or for observations from graph sets. As we show, the conjugate prior is easily specified, and is well-suited to applications such as regularization. Simulation studies show that the pooled method leads to estimates with good frequentist properties, and posterior estimates under the conjugate prior are well-behaved. We demonstrate the usefulness of our approach with applications to pooled analysis of brain functional connectivity networks and to replicated x-ray crystal structures of hen egg-white lysozyme. |
format | Online Article Text |
id | pubmed-9417041 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-94170412022-08-27 Highly scalable maximum likelihood and conjugate Bayesian inference for ERGMs on graph sets with equivalent vertices Yin, Fan Butts, Carter T. PLoS One Research Article The exponential family random graph modeling (ERGM) framework provides a highly flexible approach for the statistical analysis of networks (i.e., graphs). As ERGMs with dyadic dependence involve normalizing factors that are extremely costly to compute, practical strategies for ERGMs inference generally employ a variety of approximations or other workarounds. Markov Chain Monte Carlo maximum likelihood (MCMC MLE) provides a powerful tool to approximate the maximum likelihood estimator (MLE) of ERGM parameters, and is generally feasible for typical models on single networks with as many as a few thousand nodes. MCMC-based algorithms for Bayesian analysis are more expensive, and high-quality answers are challenging to obtain on large graphs. For both strategies, extension to the pooled case—in which we observe multiple networks from a common generative process—adds further computational cost, with both time and memory scaling linearly in the number of graphs. This becomes prohibitive for large networks, or cases in which large numbers of graph observations are available. Here, we exploit some basic properties of the discrete exponential families to develop an approach for ERGM inference in the pooled case that (where applicable) allows an arbitrarily large number of graph observations to be fit at no additional computational cost beyond preprocessing the data itself. Moreover, a variant of our approach can also be used to perform Bayesian inference under conjugate priors, again with no additional computational cost in the estimation phase. The latter can be employed either for single graph observations, or for observations from graph sets. As we show, the conjugate prior is easily specified, and is well-suited to applications such as regularization. Simulation studies show that the pooled method leads to estimates with good frequentist properties, and posterior estimates under the conjugate prior are well-behaved. We demonstrate the usefulness of our approach with applications to pooled analysis of brain functional connectivity networks and to replicated x-ray crystal structures of hen egg-white lysozyme. Public Library of Science 2022-08-26 /pmc/articles/PMC9417041/ /pubmed/36018834 http://dx.doi.org/10.1371/journal.pone.0273039 Text en © 2022 Yin, Butts https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Yin, Fan Butts, Carter T. Highly scalable maximum likelihood and conjugate Bayesian inference for ERGMs on graph sets with equivalent vertices |
title | Highly scalable maximum likelihood and conjugate Bayesian inference for ERGMs on graph sets with equivalent vertices |
title_full | Highly scalable maximum likelihood and conjugate Bayesian inference for ERGMs on graph sets with equivalent vertices |
title_fullStr | Highly scalable maximum likelihood and conjugate Bayesian inference for ERGMs on graph sets with equivalent vertices |
title_full_unstemmed | Highly scalable maximum likelihood and conjugate Bayesian inference for ERGMs on graph sets with equivalent vertices |
title_short | Highly scalable maximum likelihood and conjugate Bayesian inference for ERGMs on graph sets with equivalent vertices |
title_sort | highly scalable maximum likelihood and conjugate bayesian inference for ergms on graph sets with equivalent vertices |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9417041/ https://www.ncbi.nlm.nih.gov/pubmed/36018834 http://dx.doi.org/10.1371/journal.pone.0273039 |
work_keys_str_mv | AT yinfan highlyscalablemaximumlikelihoodandconjugatebayesianinferenceforergmsongraphsetswithequivalentvertices AT buttscartert highlyscalablemaximumlikelihoodandconjugatebayesianinferenceforergmsongraphsetswithequivalentvertices |