Cargando…

Optimal two-stage sampling for mean estimation in multilevel populations when cluster size is informative

To estimate the mean of a quantitative variable in a hierarchical population, it is logistically convenient to sample in two stages (two-stage sampling), i.e. selecting first clusters, and then individuals from the sampled clusters. Allowing cluster size to vary in the population and to be related t...

Descripción completa

Detalles Bibliográficos
Autores principales: Innocenti, Francesco, Candel, Math JJM, Tan, Frans ES, van Breukelen, Gerard JP
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8172256/
https://www.ncbi.nlm.nih.gov/pubmed/32940135
http://dx.doi.org/10.1177/0962280220952833
_version_ 1783702506611671040
author Innocenti, Francesco
Candel, Math JJM
Tan, Frans ES
van Breukelen, Gerard JP
author_facet Innocenti, Francesco
Candel, Math JJM
Tan, Frans ES
van Breukelen, Gerard JP
author_sort Innocenti, Francesco
collection PubMed
description To estimate the mean of a quantitative variable in a hierarchical population, it is logistically convenient to sample in two stages (two-stage sampling), i.e. selecting first clusters, and then individuals from the sampled clusters. Allowing cluster size to vary in the population and to be related to the mean of the outcome variable of interest (informative cluster size), the following competing sampling designs are considered: sampling clusters with probability proportional to cluster size, and then the same number of individuals per cluster; drawing clusters with equal probability, and then the same percentage of individuals per cluster; and selecting clusters with equal probability, and then the same number of individuals per cluster. For each design, optimal sample sizes are derived under a budget constraint. The three optimal two-stage sampling designs are compared, in terms of efficiency, with each other and with simple random sampling of individuals. Sampling clusters with probability proportional to size is recommended. To overcome the dependency of the optimal design on unknown nuisance parameters, maximin designs are derived. The results are illustrated, assuming probability proportional to size sampling of clusters, with the planning of a hypothetical survey to compare adolescent alcohol consumption between France and Italy.
format Online
Article
Text
id pubmed-8172256
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-81722562021-06-21 Optimal two-stage sampling for mean estimation in multilevel populations when cluster size is informative Innocenti, Francesco Candel, Math JJM Tan, Frans ES van Breukelen, Gerard JP Stat Methods Med Res Articles To estimate the mean of a quantitative variable in a hierarchical population, it is logistically convenient to sample in two stages (two-stage sampling), i.e. selecting first clusters, and then individuals from the sampled clusters. Allowing cluster size to vary in the population and to be related to the mean of the outcome variable of interest (informative cluster size), the following competing sampling designs are considered: sampling clusters with probability proportional to cluster size, and then the same number of individuals per cluster; drawing clusters with equal probability, and then the same percentage of individuals per cluster; and selecting clusters with equal probability, and then the same number of individuals per cluster. For each design, optimal sample sizes are derived under a budget constraint. The three optimal two-stage sampling designs are compared, in terms of efficiency, with each other and with simple random sampling of individuals. Sampling clusters with probability proportional to size is recommended. To overcome the dependency of the optimal design on unknown nuisance parameters, maximin designs are derived. The results are illustrated, assuming probability proportional to size sampling of clusters, with the planning of a hypothetical survey to compare adolescent alcohol consumption between France and Italy. SAGE Publications 2020-09-17 2021-02 /pmc/articles/PMC8172256/ /pubmed/32940135 http://dx.doi.org/10.1177/0962280220952833 Text en © The Author(s) 2020 https://creativecommons.org/licenses/by-nc/4.0/This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Articles
Innocenti, Francesco
Candel, Math JJM
Tan, Frans ES
van Breukelen, Gerard JP
Optimal two-stage sampling for mean estimation in multilevel populations when cluster size is informative
title Optimal two-stage sampling for mean estimation in multilevel populations when cluster size is informative
title_full Optimal two-stage sampling for mean estimation in multilevel populations when cluster size is informative
title_fullStr Optimal two-stage sampling for mean estimation in multilevel populations when cluster size is informative
title_full_unstemmed Optimal two-stage sampling for mean estimation in multilevel populations when cluster size is informative
title_short Optimal two-stage sampling for mean estimation in multilevel populations when cluster size is informative
title_sort optimal two-stage sampling for mean estimation in multilevel populations when cluster size is informative
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8172256/
https://www.ncbi.nlm.nih.gov/pubmed/32940135
http://dx.doi.org/10.1177/0962280220952833
work_keys_str_mv AT innocentifrancesco optimaltwostagesamplingformeanestimationinmultilevelpopulationswhenclustersizeisinformative
AT candelmathjjm optimaltwostagesamplingformeanestimationinmultilevelpopulationswhenclustersizeisinformative
AT tanfranses optimaltwostagesamplingformeanestimationinmultilevelpopulationswhenclustersizeisinformative
AT vanbreukelengerardjp optimaltwostagesamplingformeanestimationinmultilevelpopulationswhenclustersizeisinformative