Cargando…

Combined test versus logrank/Cox test in 50 randomised trials

BACKGROUND: The logrank test and the Cox proportional hazards model are routinely applied in the design and analysis of randomised controlled trials (RCTs) with time-to-event outcomes. Usually, sample size and power calculations assume proportional hazards (PH) of the treatment effect, i.e. the haza...

Descripción completa

Detalles Bibliográficos
Autores principales: Royston, Patrick, Choodari-Oskooei, Babak, Parmar, Mahesh K. B., Rogers, Jennifer K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6424007/
https://www.ncbi.nlm.nih.gov/pubmed/30885277
http://dx.doi.org/10.1186/s13063-019-3251-5
Descripción
Sumario:BACKGROUND: The logrank test and the Cox proportional hazards model are routinely applied in the design and analysis of randomised controlled trials (RCTs) with time-to-event outcomes. Usually, sample size and power calculations assume proportional hazards (PH) of the treatment effect, i.e. the hazard ratio is constant over the entire follow-up period. If the PH assumption fails, the power of the logrank/Cox test may be reduced, sometimes severely. It is, therefore, important to understand how serious this can become in real trials, and for a proven, alternative test to be available to increase the robustness of the primary test. METHODS: We performed a systematic search to identify relevant articles in four leading medical journals that publish results of phase 3 clinical trials. Altogether, 50 articles satisfied our inclusion criteria. We digitised published Kaplan–Meier curves and created approximations to the original times to event or censoring at the individual patient level. Using the reconstructed data, we tested for non-PH in all 50 trials. We compared the results from the logrank/Cox test with those from the combined test recently proposed by Royston and Parmar. RESULTS: The PH assumption was checked and reported only in 28% of the studies. Evidence of non-PH at the 0.10 level was detected in 31% of comparisons. The Cox test of the treatment effect was significant at the 0.05 level in 49% of comparisons, and the combined test in 55%. In four of five trials with discordant results, the interpretation would have changed had the combined test been used. The degree of non-PH and the dominance of the p value for the combined test were strongly associated. Graphical investigation suggested that non-PH was mostly due to a treatment effect manifesting in an early follow-up and disappearing later. CONCLUSIONS: The evidence for non-PH is checked (and, hence, identified) in only a small minority of RCTs, but non-PH may be present in a substantial fraction of such trials. In our reanalysis of the reconstructed data from 50 trials, the combined test outperformed the Cox test overall. The combined test is a promising approach to making trial design and analysis more robust.