Cargando…
GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data
BACKGROUND: Household survey data are collected by governments, international organizations, and companies to prioritize policies and allocate billions of dollars. Surveys are typically selected from recent census data; however, census data are often outdated or inaccurate. This paper describes how...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5518145/ https://www.ncbi.nlm.nih.gov/pubmed/28724433 http://dx.doi.org/10.1186/s12942-017-0098-4 |
_version_ | 1783251435364810752 |
---|---|
author | Thomson, Dana R. Stevens, Forrest R. Ruktanonchai, Nick W. Tatem, Andrew J. Castro, Marcia C. |
author_facet | Thomson, Dana R. Stevens, Forrest R. Ruktanonchai, Nick W. Tatem, Andrew J. Castro, Marcia C. |
author_sort | Thomson, Dana R. |
collection | PubMed |
description | BACKGROUND: Household survey data are collected by governments, international organizations, and companies to prioritize policies and allocate billions of dollars. Surveys are typically selected from recent census data; however, census data are often outdated or inaccurate. This paper describes how gridded population data might instead be used as a sample frame, and introduces the R GridSample algorithm for selecting primary sampling units (PSU) for complex household surveys with gridded population data. With a gridded population dataset and geographic boundary of the study area, GridSample allows a two-step process to sample “seed” cells with probability proportionate to estimated population size, then “grows” PSUs until a minimum population is achieved in each PSU. The algorithm permits stratification and oversampling of urban or rural areas. The approximately uniform size and shape of grid cells allows for spatial oversampling, not possible in typical surveys, possibly improving small area estimates with survey results. RESULTS: We replicated the 2010 Rwanda Demographic and Health Survey (DHS) in GridSample by sampling the WorldPop 2010 UN-adjusted 100 m × 100 m gridded population dataset, stratifying by Rwanda’s 30 districts, and oversampling in urban areas. The 2010 Rwanda DHS had 79 urban PSUs, 413 rural PSUs, with an average PSU population of 610 people. An equivalent sample in GridSample had 75 urban PSUs, 405 rural PSUs, and a median PSU population of 612 people. The number of PSUs differed because DHS added urban PSUs from specific districts while GridSample reallocated rural-to-urban PSUs across all districts. CONCLUSIONS: Gridded population sampling is a promising alternative to typical census-based sampling when census data are moderately outdated or inaccurate. Four approaches to implementation have been tried: (1) using gridded PSU boundaries produced by GridSample, (2) manually segmenting gridded PSU using satellite imagery, (3) non-probability sampling (e.g. random-walk, “spin-the-pen”), and random sampling of households. Gridded population sampling is in its infancy, and further research is needed to assess the accuracy and feasibility of gridded population sampling. The GridSample R algorithm can be used to forward this research agenda. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12942-017-0098-4) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5518145 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-55181452017-08-16 GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data Thomson, Dana R. Stevens, Forrest R. Ruktanonchai, Nick W. Tatem, Andrew J. Castro, Marcia C. Int J Health Geogr Methodology BACKGROUND: Household survey data are collected by governments, international organizations, and companies to prioritize policies and allocate billions of dollars. Surveys are typically selected from recent census data; however, census data are often outdated or inaccurate. This paper describes how gridded population data might instead be used as a sample frame, and introduces the R GridSample algorithm for selecting primary sampling units (PSU) for complex household surveys with gridded population data. With a gridded population dataset and geographic boundary of the study area, GridSample allows a two-step process to sample “seed” cells with probability proportionate to estimated population size, then “grows” PSUs until a minimum population is achieved in each PSU. The algorithm permits stratification and oversampling of urban or rural areas. The approximately uniform size and shape of grid cells allows for spatial oversampling, not possible in typical surveys, possibly improving small area estimates with survey results. RESULTS: We replicated the 2010 Rwanda Demographic and Health Survey (DHS) in GridSample by sampling the WorldPop 2010 UN-adjusted 100 m × 100 m gridded population dataset, stratifying by Rwanda’s 30 districts, and oversampling in urban areas. The 2010 Rwanda DHS had 79 urban PSUs, 413 rural PSUs, with an average PSU population of 610 people. An equivalent sample in GridSample had 75 urban PSUs, 405 rural PSUs, and a median PSU population of 612 people. The number of PSUs differed because DHS added urban PSUs from specific districts while GridSample reallocated rural-to-urban PSUs across all districts. CONCLUSIONS: Gridded population sampling is a promising alternative to typical census-based sampling when census data are moderately outdated or inaccurate. Four approaches to implementation have been tried: (1) using gridded PSU boundaries produced by GridSample, (2) manually segmenting gridded PSU using satellite imagery, (3) non-probability sampling (e.g. random-walk, “spin-the-pen”), and random sampling of households. Gridded population sampling is in its infancy, and further research is needed to assess the accuracy and feasibility of gridded population sampling. The GridSample R algorithm can be used to forward this research agenda. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12942-017-0098-4) contains supplementary material, which is available to authorized users. BioMed Central 2017-07-19 /pmc/articles/PMC5518145/ /pubmed/28724433 http://dx.doi.org/10.1186/s12942-017-0098-4 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Thomson, Dana R. Stevens, Forrest R. Ruktanonchai, Nick W. Tatem, Andrew J. Castro, Marcia C. GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data |
title | GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data |
title_full | GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data |
title_fullStr | GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data |
title_full_unstemmed | GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data |
title_short | GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data |
title_sort | gridsample: an r package to generate household survey primary sampling units (psus) from gridded population data |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5518145/ https://www.ncbi.nlm.nih.gov/pubmed/28724433 http://dx.doi.org/10.1186/s12942-017-0098-4 |
work_keys_str_mv | AT thomsondanar gridsampleanrpackagetogeneratehouseholdsurveyprimarysamplingunitspsusfromgriddedpopulationdata AT stevensforrestr gridsampleanrpackagetogeneratehouseholdsurveyprimarysamplingunitspsusfromgriddedpopulationdata AT ruktanonchainickw gridsampleanrpackagetogeneratehouseholdsurveyprimarysamplingunitspsusfromgriddedpopulationdata AT tatemandrewj gridsampleanrpackagetogeneratehouseholdsurveyprimarysamplingunitspsusfromgriddedpopulationdata AT castromarciac gridsampleanrpackagetogeneratehouseholdsurveyprimarysamplingunitspsusfromgriddedpopulationdata |