Cargando…

Consistently estimating network statistics using aggregated relational data

Collecting complete network data is expensive, time-consuming, and often infeasible. Aggregated Relational Data (ARD), which ask respondents questions of the form “How many people with trait X do you know?” provide a low-cost option when collecting complete network data is not possible. Rather than...

Descripción completa

Detalles Bibliográficos
Autores principales:	Breza, Emily, Chandrasekhar, Arun G., Lubold, Shane, McCormick, Tyler H., Pan, Mengjie
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	National Academy of Sciences 2023
Materias:	Social Sciences
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10214200/ https://www.ncbi.nlm.nih.gov/pubmed/37192169 http://dx.doi.org/10.1073/pnas.2207185120

_version_	1785145888137543680
author	Breza, Emily Chandrasekhar, Arun G. Lubold, Shane McCormick, Tyler H. Pan, Mengjie
author_facet	Breza, Emily Chandrasekhar, Arun G. Lubold, Shane McCormick, Tyler H. Pan, Mengjie
author_sort	Breza, Emily
collection	PubMed
description	Collecting complete network data is expensive, time-consuming, and often infeasible. Aggregated Relational Data (ARD), which ask respondents questions of the form “How many people with trait X do you know?” provide a low-cost option when collecting complete network data is not possible. Rather than asking about connections between each pair of individuals directly, ARD collect the number of contacts the respondent knows with a given trait. Despite widespread use and a growing literature on ARD methodology, there is still no systematic understanding of when and why ARD should accurately recover features of the unobserved network. This paper provides such a characterization by deriving conditions under which statistics about the unobserved network (or functions of these statistics like regression coefficients) can be consistently estimated using ARD. We first provide consistent estimates of network model parameters for three commonly used probabilistic models: the beta-model with node-specific unobserved effects, the stochastic block model with unobserved community structure, and latent geometric space models with unobserved latent locations. A key observation is that cross-group link probabilities for a collection of (possibly unobserved) groups identify the model parameters, meaning ARD are sufficient for parameter estimation. With these estimated parameters, it is possible to simulate graphs from the fitted distribution and analyze the distribution of network statistics. We can then characterize conditions under which the simulated networks based on ARD will allow for consistent estimation of the unobserved network statistics, such as eigenvector centrality, or response functions by or of the unobserved network, such as regression coefficients.
format	Online Article Text
id	pubmed-10214200
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	National Academy of Sciences
record_format	MEDLINE/PubMed
spelling	pubmed-102142002023-11-16 Consistently estimating network statistics using aggregated relational data Breza, Emily Chandrasekhar, Arun G. Lubold, Shane McCormick, Tyler H. Pan, Mengjie Proc Natl Acad Sci U S A Social Sciences Collecting complete network data is expensive, time-consuming, and often infeasible. Aggregated Relational Data (ARD), which ask respondents questions of the form “How many people with trait X do you know?” provide a low-cost option when collecting complete network data is not possible. Rather than asking about connections between each pair of individuals directly, ARD collect the number of contacts the respondent knows with a given trait. Despite widespread use and a growing literature on ARD methodology, there is still no systematic understanding of when and why ARD should accurately recover features of the unobserved network. This paper provides such a characterization by deriving conditions under which statistics about the unobserved network (or functions of these statistics like regression coefficients) can be consistently estimated using ARD. We first provide consistent estimates of network model parameters for three commonly used probabilistic models: the beta-model with node-specific unobserved effects, the stochastic block model with unobserved community structure, and latent geometric space models with unobserved latent locations. A key observation is that cross-group link probabilities for a collection of (possibly unobserved) groups identify the model parameters, meaning ARD are sufficient for parameter estimation. With these estimated parameters, it is possible to simulate graphs from the fitted distribution and analyze the distribution of network statistics. We can then characterize conditions under which the simulated networks based on ARD will allow for consistent estimation of the unobserved network statistics, such as eigenvector centrality, or response functions by or of the unobserved network, such as regression coefficients. National Academy of Sciences 2023-05-16 2023-05-23 /pmc/articles/PMC10214200/ /pubmed/37192169 http://dx.doi.org/10.1073/pnas.2207185120 Text en Copyright © 2023 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/This article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle	Social Sciences Breza, Emily Chandrasekhar, Arun G. Lubold, Shane McCormick, Tyler H. Pan, Mengjie Consistently estimating network statistics using aggregated relational data
title	Consistently estimating network statistics using aggregated relational data
title_full	Consistently estimating network statistics using aggregated relational data
title_fullStr	Consistently estimating network statistics using aggregated relational data
title_full_unstemmed	Consistently estimating network statistics using aggregated relational data
title_short	Consistently estimating network statistics using aggregated relational data
title_sort	consistently estimating network statistics using aggregated relational data
topic	Social Sciences
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10214200/ https://www.ncbi.nlm.nih.gov/pubmed/37192169 http://dx.doi.org/10.1073/pnas.2207185120
work_keys_str_mv	AT brezaemily consistentlyestimatingnetworkstatisticsusingaggregatedrelationaldata AT chandrasekhararung consistentlyestimatingnetworkstatisticsusingaggregatedrelationaldata AT luboldshane consistentlyestimatingnetworkstatisticsusingaggregatedrelationaldata AT mccormicktylerh consistentlyestimatingnetworkstatisticsusingaggregatedrelationaldata AT panmengjie consistentlyestimatingnetworkstatisticsusingaggregatedrelationaldata

Consistently estimating network statistics using aggregated relational data

Ejemplares similares