Cargando…
Approaches to analyzing binary data for large-scale A/B testing
An industry-academic collaboration was established to evaluate the choice of statistical test and study design for A/B testing in larger-scale industry experiments. Specifically, the standard approach at the industry partner was to apply a t-test for all outcomes, both continuous and binary, and to...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9982610/ https://www.ncbi.nlm.nih.gov/pubmed/36875556 http://dx.doi.org/10.1016/j.conctc.2023.101091 |
_version_ | 1784900367163588608 |
---|---|
author | Zhou, Wenru Kroehl, Miranda Meier, Maxene Kaizer, Alexander |
author_facet | Zhou, Wenru Kroehl, Miranda Meier, Maxene Kaizer, Alexander |
author_sort | Zhou, Wenru |
collection | PubMed |
description | An industry-academic collaboration was established to evaluate the choice of statistical test and study design for A/B testing in larger-scale industry experiments. Specifically, the standard approach at the industry partner was to apply a t-test for all outcomes, both continuous and binary, and to apply naïve interim monitoring strategies that had not evaluated the potential implications on operating characteristics such as power and type I error rates. Although many papers have summarized the robustness of the t-test, its performance for the A/B testing context of large-scale proportion data, with or without interim analyses, is needed. Investigating the effect of interim analyses on the robustness of the t-test is important, because interim analyses rely on a fraction of the total sample size and one should ensure that desired properties are maintained when a t-test is implemented not just at the end of the study, but for making interim decisions. Through simulation studies, the performance of the t-test, Chi-squared test, and Chi-squared test with Yate's correction when applied to binary outcomes data is evaluated. Further, interim monitoring through a naïve approach with no correction for multiple testing versus the O'Brien-Fleming boundary are considered in designs that allow early termination for futility, difference, or both. Results indicate that the t-test achieves similar power and type I error rates for binary outcomes data with the large sample sizes used in industrial A/B tests with and without interim monitoring, and naïve interim monitoring without corrections leads to poorly performing studies. |
format | Online Article Text |
id | pubmed-9982610 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-99826102023-03-04 Approaches to analyzing binary data for large-scale A/B testing Zhou, Wenru Kroehl, Miranda Meier, Maxene Kaizer, Alexander Contemp Clin Trials Commun Article An industry-academic collaboration was established to evaluate the choice of statistical test and study design for A/B testing in larger-scale industry experiments. Specifically, the standard approach at the industry partner was to apply a t-test for all outcomes, both continuous and binary, and to apply naïve interim monitoring strategies that had not evaluated the potential implications on operating characteristics such as power and type I error rates. Although many papers have summarized the robustness of the t-test, its performance for the A/B testing context of large-scale proportion data, with or without interim analyses, is needed. Investigating the effect of interim analyses on the robustness of the t-test is important, because interim analyses rely on a fraction of the total sample size and one should ensure that desired properties are maintained when a t-test is implemented not just at the end of the study, but for making interim decisions. Through simulation studies, the performance of the t-test, Chi-squared test, and Chi-squared test with Yate's correction when applied to binary outcomes data is evaluated. Further, interim monitoring through a naïve approach with no correction for multiple testing versus the O'Brien-Fleming boundary are considered in designs that allow early termination for futility, difference, or both. Results indicate that the t-test achieves similar power and type I error rates for binary outcomes data with the large sample sizes used in industrial A/B tests with and without interim monitoring, and naïve interim monitoring without corrections leads to poorly performing studies. Elsevier 2023-02-16 /pmc/articles/PMC9982610/ /pubmed/36875556 http://dx.doi.org/10.1016/j.conctc.2023.101091 Text en © 2023 The Authors. Published by Elsevier Inc. https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Zhou, Wenru Kroehl, Miranda Meier, Maxene Kaizer, Alexander Approaches to analyzing binary data for large-scale A/B testing |
title | Approaches to analyzing binary data for large-scale A/B testing |
title_full | Approaches to analyzing binary data for large-scale A/B testing |
title_fullStr | Approaches to analyzing binary data for large-scale A/B testing |
title_full_unstemmed | Approaches to analyzing binary data for large-scale A/B testing |
title_short | Approaches to analyzing binary data for large-scale A/B testing |
title_sort | approaches to analyzing binary data for large-scale a/b testing |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9982610/ https://www.ncbi.nlm.nih.gov/pubmed/36875556 http://dx.doi.org/10.1016/j.conctc.2023.101091 |
work_keys_str_mv | AT zhouwenru approachestoanalyzingbinarydataforlargescaleabtesting AT kroehlmiranda approachestoanalyzingbinarydataforlargescaleabtesting AT meiermaxene approachestoanalyzingbinarydataforlargescaleabtesting AT kaizeralexander approachestoanalyzingbinarydataforlargescaleabtesting |