Cargando…

A simulation study of sample size for multilevel logistic regression models

BACKGROUND: Many studies conducted in health and social sciences collect individual level data as outcome measures. Usually, such data have a hierarchical structure, with patients clustered within physicians, and physicians clustered within practices. Large survey data, including national surveys, h...

Descripción completa

Detalles Bibliográficos
Autores principales:	Moineddin, Rahim, Matheson, Flora I, Glazier, Richard H
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2007
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1955447/ https://www.ncbi.nlm.nih.gov/pubmed/17634107 http://dx.doi.org/10.1186/1471-2288-7-34

_version_	1782134612203405312
author	Moineddin, Rahim Matheson, Flora I Glazier, Richard H
author_facet	Moineddin, Rahim Matheson, Flora I Glazier, Richard H
author_sort	Moineddin, Rahim
collection	PubMed
description	BACKGROUND: Many studies conducted in health and social sciences collect individual level data as outcome measures. Usually, such data have a hierarchical structure, with patients clustered within physicians, and physicians clustered within practices. Large survey data, including national surveys, have a hierarchical or clustered structure; respondents are naturally clustered in geographical units (e.g., health regions) and may be grouped into smaller units. Outcomes of interest in many fields not only reflect continuous measures, but also binary outcomes such as depression, presence or absence of a disease, and self-reported general health. In the framework of multilevel studies an important problem is calculating an adequate sample size that generates unbiased and accurate estimates. METHODS: In this paper simulation studies are used to assess the effect of varying sample size at both the individual and group level on the accuracy of the estimates of the parameters and variance components of multilevel logistic regression models. In addition, the influence of prevalence of the outcome and the intra-class correlation coefficient (ICC) is examined. RESULTS: The results show that the estimates of the fixed effect parameters are unbiased for 100 groups with group size of 50 or higher. The estimates of the variance covariance components are slightly biased even with 100 groups and group size of 50. The biases for both fixed and random effects are severe for group size of 5. The standard errors for fixed effect parameters are unbiased while for variance covariance components are underestimated. Results suggest that low prevalent events require larger sample sizes with at least a minimum of 100 groups and 50 individuals per group. CONCLUSION: We recommend using a minimum group size of 50 with at least 50 groups to produce valid estimates for multi-level logistic regression models. Group size should be adjusted under conditions where the prevalence of events is low such that the expected number of events in each group should be greater than one.
format	Text
id	pubmed-1955447
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-19554472007-08-29 A simulation study of sample size for multilevel logistic regression models Moineddin, Rahim Matheson, Flora I Glazier, Richard H BMC Med Res Methodol Research Article BACKGROUND: Many studies conducted in health and social sciences collect individual level data as outcome measures. Usually, such data have a hierarchical structure, with patients clustered within physicians, and physicians clustered within practices. Large survey data, including national surveys, have a hierarchical or clustered structure; respondents are naturally clustered in geographical units (e.g., health regions) and may be grouped into smaller units. Outcomes of interest in many fields not only reflect continuous measures, but also binary outcomes such as depression, presence or absence of a disease, and self-reported general health. In the framework of multilevel studies an important problem is calculating an adequate sample size that generates unbiased and accurate estimates. METHODS: In this paper simulation studies are used to assess the effect of varying sample size at both the individual and group level on the accuracy of the estimates of the parameters and variance components of multilevel logistic regression models. In addition, the influence of prevalence of the outcome and the intra-class correlation coefficient (ICC) is examined. RESULTS: The results show that the estimates of the fixed effect parameters are unbiased for 100 groups with group size of 50 or higher. The estimates of the variance covariance components are slightly biased even with 100 groups and group size of 50. The biases for both fixed and random effects are severe for group size of 5. The standard errors for fixed effect parameters are unbiased while for variance covariance components are underestimated. Results suggest that low prevalent events require larger sample sizes with at least a minimum of 100 groups and 50 individuals per group. CONCLUSION: We recommend using a minimum group size of 50 with at least 50 groups to produce valid estimates for multi-level logistic regression models. Group size should be adjusted under conditions where the prevalence of events is low such that the expected number of events in each group should be greater than one. BioMed Central 2007-07-16 /pmc/articles/PMC1955447/ /pubmed/17634107 http://dx.doi.org/10.1186/1471-2288-7-34 Text en Copyright © 2007 Moineddin et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Moineddin, Rahim Matheson, Flora I Glazier, Richard H A simulation study of sample size for multilevel logistic regression models
title	A simulation study of sample size for multilevel logistic regression models
title_full	A simulation study of sample size for multilevel logistic regression models
title_fullStr	A simulation study of sample size for multilevel logistic regression models
title_full_unstemmed	A simulation study of sample size for multilevel logistic regression models
title_short	A simulation study of sample size for multilevel logistic regression models
title_sort	simulation study of sample size for multilevel logistic regression models
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1955447/ https://www.ncbi.nlm.nih.gov/pubmed/17634107 http://dx.doi.org/10.1186/1471-2288-7-34
work_keys_str_mv	AT moineddinrahim asimulationstudyofsamplesizeformultilevellogisticregressionmodels AT mathesonflorai asimulationstudyofsamplesizeformultilevellogisticregressionmodels AT glazierrichardh asimulationstudyofsamplesizeformultilevellogisticregressionmodels AT moineddinrahim simulationstudyofsamplesizeformultilevellogisticregressionmodels AT mathesonflorai simulationstudyofsamplesizeformultilevellogisticregressionmodels AT glazierrichardh simulationstudyofsamplesizeformultilevellogisticregressionmodels

A simulation study of sample size for multilevel logistic regression models

Ejemplares similares