Cargando…

Revisiting random walk based sampling in networks: evasion of burn-in period and frequent regenerations

BACKGROUND: In the framework of network sampling, random walk (RW) based estimation techniques provide many pragmatic solutions while uncovering the unknown network as little as possible. Despite several theoretical advances in this area, RW based sampling techniques usually make a strong assumption...

Descripción completa

Detalles Bibliográficos
Autores principales:	Avrachenkov, Konstantin, Borkar, Vivek S., Kadavankandy, Arun, Sreedharan, Jithin K.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2018
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5857599/ https://www.ncbi.nlm.nih.gov/pubmed/29578546 http://dx.doi.org/10.1186/s40649-018-0051-0

_version_	1783307495363575808
author	Avrachenkov, Konstantin Borkar, Vivek S. Kadavankandy, Arun Sreedharan, Jithin K.
author_facet	Avrachenkov, Konstantin Borkar, Vivek S. Kadavankandy, Arun Sreedharan, Jithin K.
author_sort	Avrachenkov, Konstantin
collection	PubMed
description	BACKGROUND: In the framework of network sampling, random walk (RW) based estimation techniques provide many pragmatic solutions while uncovering the unknown network as little as possible. Despite several theoretical advances in this area, RW based sampling techniques usually make a strong assumption that the samples are in stationary regime, and hence are impelled to leave out the samples collected during the burn-in period. METHODS: This work proposes two sampling schemes without burn-in time constraint to estimate the average of an arbitrary function defined on the network nodes, for example, the average age of users in a social network. The central idea of the algorithms lies in exploiting regeneration of RWs at revisits to an aggregated super-node or to a set of nodes, and in strategies to enhance the frequency of such regenerations either by contracting the graph or by making the hitting set larger. Our first algorithm, which is based on reinforcement learning (RL), uses stochastic approximation to derive an estimator. This method can be seen as intermediate between purely stochastic Markov chain Monte Carlo iterations and deterministic relative value iterations. The second algorithm, which we call the Ratio with Tours (RT)-estimator, is a modified form of respondent-driven sampling (RDS) that accommodates the idea of regeneration. RESULTS: We study the methods via simulations on real networks. We observe that the trajectories of RL-estimator are much more stable than those of standard random walk based estimation procedures, and its error performance is comparable to that of respondent-driven sampling (RDS) which has a smaller asymptotic variance than many other estimators. Simulation studies also show that the mean squared error of RT-estimator decays much faster than that of RDS with time. CONCLUSION: The newly developed RW based estimators (RL- and RT-estimators) allow to avoid burn-in period, provide better control of stability along the sample path, and overall reduce the estimation time. Our estimators can be applied in social and complex networks.
format	Online Article Text
id	pubmed-5857599
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-58575992018-03-21 Revisiting random walk based sampling in networks: evasion of burn-in period and frequent regenerations Avrachenkov, Konstantin Borkar, Vivek S. Kadavankandy, Arun Sreedharan, Jithin K. Comput Soc Netw Research BACKGROUND: In the framework of network sampling, random walk (RW) based estimation techniques provide many pragmatic solutions while uncovering the unknown network as little as possible. Despite several theoretical advances in this area, RW based sampling techniques usually make a strong assumption that the samples are in stationary regime, and hence are impelled to leave out the samples collected during the burn-in period. METHODS: This work proposes two sampling schemes without burn-in time constraint to estimate the average of an arbitrary function defined on the network nodes, for example, the average age of users in a social network. The central idea of the algorithms lies in exploiting regeneration of RWs at revisits to an aggregated super-node or to a set of nodes, and in strategies to enhance the frequency of such regenerations either by contracting the graph or by making the hitting set larger. Our first algorithm, which is based on reinforcement learning (RL), uses stochastic approximation to derive an estimator. This method can be seen as intermediate between purely stochastic Markov chain Monte Carlo iterations and deterministic relative value iterations. The second algorithm, which we call the Ratio with Tours (RT)-estimator, is a modified form of respondent-driven sampling (RDS) that accommodates the idea of regeneration. RESULTS: We study the methods via simulations on real networks. We observe that the trajectories of RL-estimator are much more stable than those of standard random walk based estimation procedures, and its error performance is comparable to that of respondent-driven sampling (RDS) which has a smaller asymptotic variance than many other estimators. Simulation studies also show that the mean squared error of RT-estimator decays much faster than that of RDS with time. CONCLUSION: The newly developed RW based estimators (RL- and RT-estimators) allow to avoid burn-in period, provide better control of stability along the sample path, and overall reduce the estimation time. Our estimators can be applied in social and complex networks. Springer International Publishing 2018-03-19 2018 /pmc/articles/PMC5857599/ /pubmed/29578546 http://dx.doi.org/10.1186/s40649-018-0051-0 Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle	Research Avrachenkov, Konstantin Borkar, Vivek S. Kadavankandy, Arun Sreedharan, Jithin K. Revisiting random walk based sampling in networks: evasion of burn-in period and frequent regenerations
title	Revisiting random walk based sampling in networks: evasion of burn-in period and frequent regenerations
title_full	Revisiting random walk based sampling in networks: evasion of burn-in period and frequent regenerations
title_fullStr	Revisiting random walk based sampling in networks: evasion of burn-in period and frequent regenerations
title_full_unstemmed	Revisiting random walk based sampling in networks: evasion of burn-in period and frequent regenerations
title_short	Revisiting random walk based sampling in networks: evasion of burn-in period and frequent regenerations
title_sort	revisiting random walk based sampling in networks: evasion of burn-in period and frequent regenerations
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5857599/ https://www.ncbi.nlm.nih.gov/pubmed/29578546 http://dx.doi.org/10.1186/s40649-018-0051-0
work_keys_str_mv	AT avrachenkovkonstantin revisitingrandomwalkbasedsamplinginnetworksevasionofburninperiodandfrequentregenerations AT borkarviveks revisitingrandomwalkbasedsamplinginnetworksevasionofburninperiodandfrequentregenerations AT kadavankandyarun revisitingrandomwalkbasedsamplinginnetworksevasionofburninperiodandfrequentregenerations AT sreedharanjithink revisitingrandomwalkbasedsamplinginnetworksevasionofburninperiodandfrequentregenerations

Revisiting random walk based sampling in networks: evasion of burn-in period and frequent regenerations

Ejemplares similares