Cargando…

Sample Size Guidelines for Logistic Regression from Observational Studies with Large Population: Emphasis on the Accuracy Between Statistics and Parameters Based on Real Life Clinical Data

BACKGROUND: Different study designs and population size may require different sample size for logistic regression. This study aims to propose sample size guidelines for logistic regression based on observational studies with large population. METHODS: We estimated the minimum sample size required ba...

Descripción completa

Detalles Bibliográficos
Autores principales: Bujang, Mohamad Adam, Sa’at, Nadiah, Sidik, Tg Mohd Ikhwan Tg Abu Bakar, Joo, Lim Chien
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Penerbit Universiti Sains Malaysia 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6422534/
https://www.ncbi.nlm.nih.gov/pubmed/30914854
http://dx.doi.org/10.21315/mjms2018.25.4.12
_version_ 1783404394433216512
author Bujang, Mohamad Adam
Sa’at, Nadiah
Sidik, Tg Mohd Ikhwan Tg Abu Bakar
Joo, Lim Chien
author_facet Bujang, Mohamad Adam
Sa’at, Nadiah
Sidik, Tg Mohd Ikhwan Tg Abu Bakar
Joo, Lim Chien
author_sort Bujang, Mohamad Adam
collection PubMed
description BACKGROUND: Different study designs and population size may require different sample size for logistic regression. This study aims to propose sample size guidelines for logistic regression based on observational studies with large population. METHODS: We estimated the minimum sample size required based on evaluation from real clinical data to evaluate the accuracy between statistics derived and the actual parameters. Nagelkerke r-squared and coefficients derived were compared with their respective parameters. RESULTS: With a minimum sample size of 500, results showed that the differences between the sample estimates and the population was sufficiently small. Based on an audit from a medium size of population, the differences were within ± 0.5 for coefficients and ± 0.02 for Nagelkerke r-squared. Meanwhile for large population, the differences are within ± 1.0 for coefficients and ± 0.02 for Nagelkerke r-squared. CONCLUSIONS: For observational studies with large population size that involve logistic regression in the analysis, taking a minimum sample size of 500 is necessary to derive the statistics that represent the parameters. The other recommended rules of thumb are EPV of 50 and formula; n = 100 + 50i where i refers to number of independent variables in the final model.
format Online
Article
Text
id pubmed-6422534
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Penerbit Universiti Sains Malaysia
record_format MEDLINE/PubMed
spelling pubmed-64225342019-03-26 Sample Size Guidelines for Logistic Regression from Observational Studies with Large Population: Emphasis on the Accuracy Between Statistics and Parameters Based on Real Life Clinical Data Bujang, Mohamad Adam Sa’at, Nadiah Sidik, Tg Mohd Ikhwan Tg Abu Bakar Joo, Lim Chien Malays J Med Sci Original Article BACKGROUND: Different study designs and population size may require different sample size for logistic regression. This study aims to propose sample size guidelines for logistic regression based on observational studies with large population. METHODS: We estimated the minimum sample size required based on evaluation from real clinical data to evaluate the accuracy between statistics derived and the actual parameters. Nagelkerke r-squared and coefficients derived were compared with their respective parameters. RESULTS: With a minimum sample size of 500, results showed that the differences between the sample estimates and the population was sufficiently small. Based on an audit from a medium size of population, the differences were within ± 0.5 for coefficients and ± 0.02 for Nagelkerke r-squared. Meanwhile for large population, the differences are within ± 1.0 for coefficients and ± 0.02 for Nagelkerke r-squared. CONCLUSIONS: For observational studies with large population size that involve logistic regression in the analysis, taking a minimum sample size of 500 is necessary to derive the statistics that represent the parameters. The other recommended rules of thumb are EPV of 50 and formula; n = 100 + 50i where i refers to number of independent variables in the final model. Penerbit Universiti Sains Malaysia 2018-07 2018-08-30 /pmc/articles/PMC6422534/ /pubmed/30914854 http://dx.doi.org/10.21315/mjms2018.25.4.12 Text en © Penerbit Universiti Sains Malaysia, 2018 This work is licensed under the terms of the Creative Commons Attribution (CC BY) (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Original Article
Bujang, Mohamad Adam
Sa’at, Nadiah
Sidik, Tg Mohd Ikhwan Tg Abu Bakar
Joo, Lim Chien
Sample Size Guidelines for Logistic Regression from Observational Studies with Large Population: Emphasis on the Accuracy Between Statistics and Parameters Based on Real Life Clinical Data
title Sample Size Guidelines for Logistic Regression from Observational Studies with Large Population: Emphasis on the Accuracy Between Statistics and Parameters Based on Real Life Clinical Data
title_full Sample Size Guidelines for Logistic Regression from Observational Studies with Large Population: Emphasis on the Accuracy Between Statistics and Parameters Based on Real Life Clinical Data
title_fullStr Sample Size Guidelines for Logistic Regression from Observational Studies with Large Population: Emphasis on the Accuracy Between Statistics and Parameters Based on Real Life Clinical Data
title_full_unstemmed Sample Size Guidelines for Logistic Regression from Observational Studies with Large Population: Emphasis on the Accuracy Between Statistics and Parameters Based on Real Life Clinical Data
title_short Sample Size Guidelines for Logistic Regression from Observational Studies with Large Population: Emphasis on the Accuracy Between Statistics and Parameters Based on Real Life Clinical Data
title_sort sample size guidelines for logistic regression from observational studies with large population: emphasis on the accuracy between statistics and parameters based on real life clinical data
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6422534/
https://www.ncbi.nlm.nih.gov/pubmed/30914854
http://dx.doi.org/10.21315/mjms2018.25.4.12
work_keys_str_mv AT bujangmohamadadam samplesizeguidelinesforlogisticregressionfromobservationalstudieswithlargepopulationemphasisontheaccuracybetweenstatisticsandparametersbasedonreallifeclinicaldata
AT saatnadiah samplesizeguidelinesforlogisticregressionfromobservationalstudieswithlargepopulationemphasisontheaccuracybetweenstatisticsandparametersbasedonreallifeclinicaldata
AT sidiktgmohdikhwantgabubakar samplesizeguidelinesforlogisticregressionfromobservationalstudieswithlargepopulationemphasisontheaccuracybetweenstatisticsandparametersbasedonreallifeclinicaldata
AT joolimchien samplesizeguidelinesforlogisticregressionfromobservationalstudieswithlargepopulationemphasisontheaccuracybetweenstatisticsandparametersbasedonreallifeclinicaldata