Cargando…

GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data

BACKGROUND: Household survey data are collected by governments, international organizations, and companies to prioritize policies and allocate billions of dollars. Surveys are typically selected from recent census data; however, census data are often outdated or inaccurate. This paper describes how...

Descripción completa

Detalles Bibliográficos
Autores principales: Thomson, Dana R., Stevens, Forrest R., Ruktanonchai, Nick W., Tatem, Andrew J., Castro, Marcia C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5518145/
https://www.ncbi.nlm.nih.gov/pubmed/28724433
http://dx.doi.org/10.1186/s12942-017-0098-4
_version_ 1783251435364810752
author Thomson, Dana R.
Stevens, Forrest R.
Ruktanonchai, Nick W.
Tatem, Andrew J.
Castro, Marcia C.
author_facet Thomson, Dana R.
Stevens, Forrest R.
Ruktanonchai, Nick W.
Tatem, Andrew J.
Castro, Marcia C.
author_sort Thomson, Dana R.
collection PubMed
description BACKGROUND: Household survey data are collected by governments, international organizations, and companies to prioritize policies and allocate billions of dollars. Surveys are typically selected from recent census data; however, census data are often outdated or inaccurate. This paper describes how gridded population data might instead be used as a sample frame, and introduces the R GridSample algorithm for selecting primary sampling units (PSU) for complex household surveys with gridded population data. With a gridded population dataset and geographic boundary of the study area, GridSample allows a two-step process to sample “seed” cells with probability proportionate to estimated population size, then “grows” PSUs until a minimum population is achieved in each PSU. The algorithm permits stratification and oversampling of urban or rural areas. The approximately uniform size and shape of grid cells allows for spatial oversampling, not possible in typical surveys, possibly improving small area estimates with survey results. RESULTS: We replicated the 2010 Rwanda Demographic and Health Survey (DHS) in GridSample by sampling the WorldPop 2010 UN-adjusted 100 m × 100 m gridded population dataset, stratifying by Rwanda’s 30 districts, and oversampling in urban areas. The 2010 Rwanda DHS had 79 urban PSUs, 413 rural PSUs, with an average PSU population of 610 people. An equivalent sample in GridSample had 75 urban PSUs, 405 rural PSUs, and a median PSU population of 612 people. The number of PSUs differed because DHS added urban PSUs from specific districts while GridSample reallocated rural-to-urban PSUs across all districts. CONCLUSIONS: Gridded population sampling is a promising alternative to typical census-based sampling when census data are moderately outdated or inaccurate. Four approaches to implementation have been tried: (1) using gridded PSU boundaries produced by GridSample, (2) manually segmenting gridded PSU using satellite imagery, (3) non-probability sampling (e.g. random-walk, “spin-the-pen”), and random sampling of households. Gridded population sampling is in its infancy, and further research is needed to assess the accuracy and feasibility of gridded population sampling. The GridSample R algorithm can be used to forward this research agenda. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12942-017-0098-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5518145
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55181452017-08-16 GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data Thomson, Dana R. Stevens, Forrest R. Ruktanonchai, Nick W. Tatem, Andrew J. Castro, Marcia C. Int J Health Geogr Methodology BACKGROUND: Household survey data are collected by governments, international organizations, and companies to prioritize policies and allocate billions of dollars. Surveys are typically selected from recent census data; however, census data are often outdated or inaccurate. This paper describes how gridded population data might instead be used as a sample frame, and introduces the R GridSample algorithm for selecting primary sampling units (PSU) for complex household surveys with gridded population data. With a gridded population dataset and geographic boundary of the study area, GridSample allows a two-step process to sample “seed” cells with probability proportionate to estimated population size, then “grows” PSUs until a minimum population is achieved in each PSU. The algorithm permits stratification and oversampling of urban or rural areas. The approximately uniform size and shape of grid cells allows for spatial oversampling, not possible in typical surveys, possibly improving small area estimates with survey results. RESULTS: We replicated the 2010 Rwanda Demographic and Health Survey (DHS) in GridSample by sampling the WorldPop 2010 UN-adjusted 100 m × 100 m gridded population dataset, stratifying by Rwanda’s 30 districts, and oversampling in urban areas. The 2010 Rwanda DHS had 79 urban PSUs, 413 rural PSUs, with an average PSU population of 610 people. An equivalent sample in GridSample had 75 urban PSUs, 405 rural PSUs, and a median PSU population of 612 people. The number of PSUs differed because DHS added urban PSUs from specific districts while GridSample reallocated rural-to-urban PSUs across all districts. CONCLUSIONS: Gridded population sampling is a promising alternative to typical census-based sampling when census data are moderately outdated or inaccurate. Four approaches to implementation have been tried: (1) using gridded PSU boundaries produced by GridSample, (2) manually segmenting gridded PSU using satellite imagery, (3) non-probability sampling (e.g. random-walk, “spin-the-pen”), and random sampling of households. Gridded population sampling is in its infancy, and further research is needed to assess the accuracy and feasibility of gridded population sampling. The GridSample R algorithm can be used to forward this research agenda. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12942-017-0098-4) contains supplementary material, which is available to authorized users. BioMed Central 2017-07-19 /pmc/articles/PMC5518145/ /pubmed/28724433 http://dx.doi.org/10.1186/s12942-017-0098-4 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Thomson, Dana R.
Stevens, Forrest R.
Ruktanonchai, Nick W.
Tatem, Andrew J.
Castro, Marcia C.
GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data
title GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data
title_full GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data
title_fullStr GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data
title_full_unstemmed GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data
title_short GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data
title_sort gridsample: an r package to generate household survey primary sampling units (psus) from gridded population data
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5518145/
https://www.ncbi.nlm.nih.gov/pubmed/28724433
http://dx.doi.org/10.1186/s12942-017-0098-4
work_keys_str_mv AT thomsondanar gridsampleanrpackagetogeneratehouseholdsurveyprimarysamplingunitspsusfromgriddedpopulationdata
AT stevensforrestr gridsampleanrpackagetogeneratehouseholdsurveyprimarysamplingunitspsusfromgriddedpopulationdata
AT ruktanonchainickw gridsampleanrpackagetogeneratehouseholdsurveyprimarysamplingunitspsusfromgriddedpopulationdata
AT tatemandrewj gridsampleanrpackagetogeneratehouseholdsurveyprimarysamplingunitspsusfromgriddedpopulationdata
AT castromarciac gridsampleanrpackagetogeneratehouseholdsurveyprimarysamplingunitspsusfromgriddedpopulationdata