Cargando…

Parsing Social Network Survey Data from Hidden Populations Using Stochastic Context-Free Grammars

BACKGROUND: Human populations are structured by social networks, in which individuals tend to form relationships based on shared attributes. Certain attributes that are ambiguous, stigmatized or illegal can create a ÔhiddenÕ population, so-called because its members are difficult to identify. Many h...

Descripción completa

Detalles Bibliográficos
Autores principales:	Poon, Art F. Y., Brouwer, Kimberly C., Strathdee, Steffanie A., Firestone-Cruz, Michelle, Lozada, Remedios M., Kosakovsky Pond, Sergei L., Heckathorn, Douglas D., Frost, Simon D. W.
Formato:	Texto
Lenguaje:	English
Publicado:	Public Library of Science 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2734164/ https://www.ncbi.nlm.nih.gov/pubmed/19738904 http://dx.doi.org/10.1371/journal.pone.0006777

_version_	1782171135666814976
author	Poon, Art F. Y. Brouwer, Kimberly C. Strathdee, Steffanie A. Firestone-Cruz, Michelle Lozada, Remedios M. Kosakovsky Pond, Sergei L. Heckathorn, Douglas D. Frost, Simon D. W.
author_facet	Poon, Art F. Y. Brouwer, Kimberly C. Strathdee, Steffanie A. Firestone-Cruz, Michelle Lozada, Remedios M. Kosakovsky Pond, Sergei L. Heckathorn, Douglas D. Frost, Simon D. W.
author_sort	Poon, Art F. Y.
collection	PubMed
description	BACKGROUND: Human populations are structured by social networks, in which individuals tend to form relationships based on shared attributes. Certain attributes that are ambiguous, stigmatized or illegal can create a ÔhiddenÕ population, so-called because its members are difficult to identify. Many hidden populations are also at an elevated risk of exposure to infectious diseases. Consequently, public health agencies are presently adopting modern survey techniques that traverse social networks in hidden populations by soliciting individuals to recruit their peers, e.g., respondent-driven sampling (RDS). The concomitant accumulation of network-based epidemiological data, however, is rapidly outpacing the development of computational methods for analysis. Moreover, current analytical models rely on unrealistic assumptions, e.g., that the traversal of social networks can be modeled by a Markov chain rather than a branching process. METHODOLOGY/PRINCIPAL FINDINGS: Here, we develop a new methodology based on stochastic context-free grammars (SCFGs), which are well-suited to modeling tree-like structure of the RDS recruitment process. We apply this methodology to an RDS case study of injection drug users (IDUs) in Tijuana, México, a hidden population at high risk of blood-borne and sexually-transmitted infections (i.e., HIV, hepatitis C virus, syphilis). Survey data were encoded as text strings that were parsed using our custom implementation of the inside-outside algorithm in a publicly-available software package (HyPhy), which uses either expectation maximization or direct optimization methods and permits constraints on model parameters for hypothesis testing. We identified significant latent variability in the recruitment process that violates assumptions of Markov chain-based methods for RDS analysis: firstly, IDUs tended to emulate the recruitment behavior of their own recruiter; and secondly, the recruitment of like peers (homophily) was dependent on the number of recruits. CONCLUSIONS: SCFGs provide a rich probabilistic language that can articulate complex latent structure in survey data derived from the traversal of social networks. Such structure that has no representation in Markov chain-based models can interfere with the estimation of the composition of hidden populations if left unaccounted for, raising critical implications for the prevention and control of infectious disease epidemics.
format	Text
id	pubmed-2734164
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-27341642009-09-07 Parsing Social Network Survey Data from Hidden Populations Using Stochastic Context-Free Grammars Poon, Art F. Y. Brouwer, Kimberly C. Strathdee, Steffanie A. Firestone-Cruz, Michelle Lozada, Remedios M. Kosakovsky Pond, Sergei L. Heckathorn, Douglas D. Frost, Simon D. W. PLoS One Research Article BACKGROUND: Human populations are structured by social networks, in which individuals tend to form relationships based on shared attributes. Certain attributes that are ambiguous, stigmatized or illegal can create a ÔhiddenÕ population, so-called because its members are difficult to identify. Many hidden populations are also at an elevated risk of exposure to infectious diseases. Consequently, public health agencies are presently adopting modern survey techniques that traverse social networks in hidden populations by soliciting individuals to recruit their peers, e.g., respondent-driven sampling (RDS). The concomitant accumulation of network-based epidemiological data, however, is rapidly outpacing the development of computational methods for analysis. Moreover, current analytical models rely on unrealistic assumptions, e.g., that the traversal of social networks can be modeled by a Markov chain rather than a branching process. METHODOLOGY/PRINCIPAL FINDINGS: Here, we develop a new methodology based on stochastic context-free grammars (SCFGs), which are well-suited to modeling tree-like structure of the RDS recruitment process. We apply this methodology to an RDS case study of injection drug users (IDUs) in Tijuana, México, a hidden population at high risk of blood-borne and sexually-transmitted infections (i.e., HIV, hepatitis C virus, syphilis). Survey data were encoded as text strings that were parsed using our custom implementation of the inside-outside algorithm in a publicly-available software package (HyPhy), which uses either expectation maximization or direct optimization methods and permits constraints on model parameters for hypothesis testing. We identified significant latent variability in the recruitment process that violates assumptions of Markov chain-based methods for RDS analysis: firstly, IDUs tended to emulate the recruitment behavior of their own recruiter; and secondly, the recruitment of like peers (homophily) was dependent on the number of recruits. CONCLUSIONS: SCFGs provide a rich probabilistic language that can articulate complex latent structure in survey data derived from the traversal of social networks. Such structure that has no representation in Markov chain-based models can interfere with the estimation of the composition of hidden populations if left unaccounted for, raising critical implications for the prevention and control of infectious disease epidemics. Public Library of Science 2009-09-07 /pmc/articles/PMC2734164/ /pubmed/19738904 http://dx.doi.org/10.1371/journal.pone.0006777 Text en Poon et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Poon, Art F. Y. Brouwer, Kimberly C. Strathdee, Steffanie A. Firestone-Cruz, Michelle Lozada, Remedios M. Kosakovsky Pond, Sergei L. Heckathorn, Douglas D. Frost, Simon D. W. Parsing Social Network Survey Data from Hidden Populations Using Stochastic Context-Free Grammars
title	Parsing Social Network Survey Data from Hidden Populations Using Stochastic Context-Free Grammars
title_full	Parsing Social Network Survey Data from Hidden Populations Using Stochastic Context-Free Grammars
title_fullStr	Parsing Social Network Survey Data from Hidden Populations Using Stochastic Context-Free Grammars
title_full_unstemmed	Parsing Social Network Survey Data from Hidden Populations Using Stochastic Context-Free Grammars
title_short	Parsing Social Network Survey Data from Hidden Populations Using Stochastic Context-Free Grammars
title_sort	parsing social network survey data from hidden populations using stochastic context-free grammars
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2734164/ https://www.ncbi.nlm.nih.gov/pubmed/19738904 http://dx.doi.org/10.1371/journal.pone.0006777
work_keys_str_mv	AT poonartfy parsingsocialnetworksurveydatafromhiddenpopulationsusingstochasticcontextfreegrammars AT brouwerkimberlyc parsingsocialnetworksurveydatafromhiddenpopulationsusingstochasticcontextfreegrammars AT strathdeesteffaniea parsingsocialnetworksurveydatafromhiddenpopulationsusingstochasticcontextfreegrammars AT firestonecruzmichelle parsingsocialnetworksurveydatafromhiddenpopulationsusingstochasticcontextfreegrammars AT lozadaremediosm parsingsocialnetworksurveydatafromhiddenpopulationsusingstochasticcontextfreegrammars AT kosakovskypondsergeil parsingsocialnetworksurveydatafromhiddenpopulationsusingstochasticcontextfreegrammars AT heckathorndouglasd parsingsocialnetworksurveydatafromhiddenpopulationsusingstochasticcontextfreegrammars AT frostsimondw parsingsocialnetworksurveydatafromhiddenpopulationsusingstochasticcontextfreegrammars

Parsing Social Network Survey Data from Hidden Populations Using Stochastic Context-Free Grammars

Ejemplares similares