Cargando…

How large should the next study be? Predictive power and sample size requirements for replication studies

We use information derived from over 40K trials in the Cochrane Collaboration database of systematic reviews (CDSR) to compute the replication probability, or predictive power of an experiment given its observed (two‐sided) [Formula: see text] ‐value. We find that an exact replication of a marginall...

Descripción completa

Detalles Bibliográficos
Autores principales: van Zwet, Erik W., Goodman, Steven N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9325423/
https://www.ncbi.nlm.nih.gov/pubmed/35396714
http://dx.doi.org/10.1002/sim.9406
Descripción
Sumario:We use information derived from over 40K trials in the Cochrane Collaboration database of systematic reviews (CDSR) to compute the replication probability, or predictive power of an experiment given its observed (two‐sided) [Formula: see text] ‐value. We find that an exact replication of a marginally significant result with [Formula: see text] has less than 30% chance of again reaching significance. Moreover, the replication of a result with [Formula: see text] still has only 50% chance of significance. We also compute the probability that the direction (sign) of the estimated effect is correct, which is closely related to the type S error of Gelman and Tuerlinckx. We find that if an estimated effect has [Formula: see text] , there is a 93% probability that its sign is correct. If [Formula: see text] , then that probability is 99%. Finally, we compute the required sample size for a replication study to achieve some specified power conditional on the [Formula: see text] ‐value of the original study. We find that the replication of a result with [Formula: see text] requires a sample size more than 16 times larger than the original study to achieve 80% power, while [Formula: see text] requires at least 3.5 times larger sample size. These findings confirm that failure to replicate the statistical significance of a trial does not necessarily indicate that the original result was a fluke.