Cargando…

A solution to minimum sample size for regressions

Regressions and meta-regressions are widely used to estimate patterns and effect sizes in various disciplines. However, many biological and medical analyses use relatively low sample size (N), contributing to concerns on reproducibility. What is the minimum N to identify the most plausible data patt...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jenkins, David G., Quintana-Ascencio, Pedro F.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7034864/ https://www.ncbi.nlm.nih.gov/pubmed/32084211 http://dx.doi.org/10.1371/journal.pone.0229345

_version_	1783499958335307776
author	Jenkins, David G. Quintana-Ascencio, Pedro F.
author_facet	Jenkins, David G. Quintana-Ascencio, Pedro F.
author_sort	Jenkins, David G.
collection	PubMed
description	Regressions and meta-regressions are widely used to estimate patterns and effect sizes in various disciplines. However, many biological and medical analyses use relatively low sample size (N), contributing to concerns on reproducibility. What is the minimum N to identify the most plausible data pattern using regressions? Statistical power analysis is often used to answer that question, but it has its own problems and logically should follow model selection to first identify the most plausible model. Here we make null, simple linear and quadratic data with different variances and effect sizes. We then sample and use information theoretic model selection to evaluate minimum N for regression models. We also evaluate the use of coefficient of determination (R(2)) for this purpose; it is widely used but not recommended. With very low variance, both false positives and false negatives occurred at N < 8, but data shape was always clearly identified at N ≥ 8. With high variance, accurate inference was stable at N ≥ 25. Those outcomes were consistent at different effect sizes. Akaike Information Criterion weights (AICc w(i)) were essential to clearly identify patterns (e.g., simple linear vs. null); R(2) or adjusted R(2) values were not useful. We conclude that a minimum N = 8 is informative given very little variance, but minimum N ≥ 25 is required for more variance. Alternative models are better compared using information theory indices such as AIC but not R(2) or adjusted R(2). Insufficient N and R(2)-based model selection apparently contribute to confusion and low reproducibility in various disciplines. To avoid those problems, we recommend that research based on regressions or meta-regressions use N ≥ 25.
format	Online Article Text
id	pubmed-7034864
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-70348642020-02-27 A solution to minimum sample size for regressions Jenkins, David G. Quintana-Ascencio, Pedro F. PLoS One Research Article Regressions and meta-regressions are widely used to estimate patterns and effect sizes in various disciplines. However, many biological and medical analyses use relatively low sample size (N), contributing to concerns on reproducibility. What is the minimum N to identify the most plausible data pattern using regressions? Statistical power analysis is often used to answer that question, but it has its own problems and logically should follow model selection to first identify the most plausible model. Here we make null, simple linear and quadratic data with different variances and effect sizes. We then sample and use information theoretic model selection to evaluate minimum N for regression models. We also evaluate the use of coefficient of determination (R(2)) for this purpose; it is widely used but not recommended. With very low variance, both false positives and false negatives occurred at N < 8, but data shape was always clearly identified at N ≥ 8. With high variance, accurate inference was stable at N ≥ 25. Those outcomes were consistent at different effect sizes. Akaike Information Criterion weights (AICc w(i)) were essential to clearly identify patterns (e.g., simple linear vs. null); R(2) or adjusted R(2) values were not useful. We conclude that a minimum N = 8 is informative given very little variance, but minimum N ≥ 25 is required for more variance. Alternative models are better compared using information theory indices such as AIC but not R(2) or adjusted R(2). Insufficient N and R(2)-based model selection apparently contribute to confusion and low reproducibility in various disciplines. To avoid those problems, we recommend that research based on regressions or meta-regressions use N ≥ 25. Public Library of Science 2020-02-21 /pmc/articles/PMC7034864/ /pubmed/32084211 http://dx.doi.org/10.1371/journal.pone.0229345 Text en © 2020 Jenkins, Quintana-Ascencio http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Jenkins, David G. Quintana-Ascencio, Pedro F. A solution to minimum sample size for regressions
title	A solution to minimum sample size for regressions
title_full	A solution to minimum sample size for regressions
title_fullStr	A solution to minimum sample size for regressions
title_full_unstemmed	A solution to minimum sample size for regressions
title_short	A solution to minimum sample size for regressions
title_sort	solution to minimum sample size for regressions
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7034864/ https://www.ncbi.nlm.nih.gov/pubmed/32084211 http://dx.doi.org/10.1371/journal.pone.0229345
work_keys_str_mv	AT jenkinsdavidg asolutiontominimumsamplesizeforregressions AT quintanaascenciopedrof asolutiontominimumsamplesizeforregressions AT jenkinsdavidg solutiontominimumsamplesizeforregressions AT quintanaascenciopedrof solutiontominimumsamplesizeforregressions

A solution to minimum sample size for regressions

Ejemplares similares