Cargando…

Highly scalable maximum likelihood and conjugate Bayesian inference for ERGMs on graph sets with equivalent vertices

The exponential family random graph modeling (ERGM) framework provides a highly flexible approach for the statistical analysis of networks (i.e., graphs). As ERGMs with dyadic dependence involve normalizing factors that are extremely costly to compute, practical strategies for ERGMs inference genera...

Descripción completa

Detalles Bibliográficos
Autores principales: Yin, Fan, Butts, Carter T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9417041/
https://www.ncbi.nlm.nih.gov/pubmed/36018834
http://dx.doi.org/10.1371/journal.pone.0273039
_version_ 1784776612518035456
author Yin, Fan
Butts, Carter T.
author_facet Yin, Fan
Butts, Carter T.
author_sort Yin, Fan
collection PubMed
description The exponential family random graph modeling (ERGM) framework provides a highly flexible approach for the statistical analysis of networks (i.e., graphs). As ERGMs with dyadic dependence involve normalizing factors that are extremely costly to compute, practical strategies for ERGMs inference generally employ a variety of approximations or other workarounds. Markov Chain Monte Carlo maximum likelihood (MCMC MLE) provides a powerful tool to approximate the maximum likelihood estimator (MLE) of ERGM parameters, and is generally feasible for typical models on single networks with as many as a few thousand nodes. MCMC-based algorithms for Bayesian analysis are more expensive, and high-quality answers are challenging to obtain on large graphs. For both strategies, extension to the pooled case—in which we observe multiple networks from a common generative process—adds further computational cost, with both time and memory scaling linearly in the number of graphs. This becomes prohibitive for large networks, or cases in which large numbers of graph observations are available. Here, we exploit some basic properties of the discrete exponential families to develop an approach for ERGM inference in the pooled case that (where applicable) allows an arbitrarily large number of graph observations to be fit at no additional computational cost beyond preprocessing the data itself. Moreover, a variant of our approach can also be used to perform Bayesian inference under conjugate priors, again with no additional computational cost in the estimation phase. The latter can be employed either for single graph observations, or for observations from graph sets. As we show, the conjugate prior is easily specified, and is well-suited to applications such as regularization. Simulation studies show that the pooled method leads to estimates with good frequentist properties, and posterior estimates under the conjugate prior are well-behaved. We demonstrate the usefulness of our approach with applications to pooled analysis of brain functional connectivity networks and to replicated x-ray crystal structures of hen egg-white lysozyme.
format Online
Article
Text
id pubmed-9417041
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-94170412022-08-27 Highly scalable maximum likelihood and conjugate Bayesian inference for ERGMs on graph sets with equivalent vertices Yin, Fan Butts, Carter T. PLoS One Research Article The exponential family random graph modeling (ERGM) framework provides a highly flexible approach for the statistical analysis of networks (i.e., graphs). As ERGMs with dyadic dependence involve normalizing factors that are extremely costly to compute, practical strategies for ERGMs inference generally employ a variety of approximations or other workarounds. Markov Chain Monte Carlo maximum likelihood (MCMC MLE) provides a powerful tool to approximate the maximum likelihood estimator (MLE) of ERGM parameters, and is generally feasible for typical models on single networks with as many as a few thousand nodes. MCMC-based algorithms for Bayesian analysis are more expensive, and high-quality answers are challenging to obtain on large graphs. For both strategies, extension to the pooled case—in which we observe multiple networks from a common generative process—adds further computational cost, with both time and memory scaling linearly in the number of graphs. This becomes prohibitive for large networks, or cases in which large numbers of graph observations are available. Here, we exploit some basic properties of the discrete exponential families to develop an approach for ERGM inference in the pooled case that (where applicable) allows an arbitrarily large number of graph observations to be fit at no additional computational cost beyond preprocessing the data itself. Moreover, a variant of our approach can also be used to perform Bayesian inference under conjugate priors, again with no additional computational cost in the estimation phase. The latter can be employed either for single graph observations, or for observations from graph sets. As we show, the conjugate prior is easily specified, and is well-suited to applications such as regularization. Simulation studies show that the pooled method leads to estimates with good frequentist properties, and posterior estimates under the conjugate prior are well-behaved. We demonstrate the usefulness of our approach with applications to pooled analysis of brain functional connectivity networks and to replicated x-ray crystal structures of hen egg-white lysozyme. Public Library of Science 2022-08-26 /pmc/articles/PMC9417041/ /pubmed/36018834 http://dx.doi.org/10.1371/journal.pone.0273039 Text en © 2022 Yin, Butts https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Yin, Fan
Butts, Carter T.
Highly scalable maximum likelihood and conjugate Bayesian inference for ERGMs on graph sets with equivalent vertices
title Highly scalable maximum likelihood and conjugate Bayesian inference for ERGMs on graph sets with equivalent vertices
title_full Highly scalable maximum likelihood and conjugate Bayesian inference for ERGMs on graph sets with equivalent vertices
title_fullStr Highly scalable maximum likelihood and conjugate Bayesian inference for ERGMs on graph sets with equivalent vertices
title_full_unstemmed Highly scalable maximum likelihood and conjugate Bayesian inference for ERGMs on graph sets with equivalent vertices
title_short Highly scalable maximum likelihood and conjugate Bayesian inference for ERGMs on graph sets with equivalent vertices
title_sort highly scalable maximum likelihood and conjugate bayesian inference for ergms on graph sets with equivalent vertices
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9417041/
https://www.ncbi.nlm.nih.gov/pubmed/36018834
http://dx.doi.org/10.1371/journal.pone.0273039
work_keys_str_mv AT yinfan highlyscalablemaximumlikelihoodandconjugatebayesianinferenceforergmsongraphsetswithequivalentvertices
AT buttscartert highlyscalablemaximumlikelihoodandconjugatebayesianinferenceforergmsongraphsetswithequivalentvertices