Cargando…

Approaches to analyzing binary data for large-scale A/B testing

An industry-academic collaboration was established to evaluate the choice of statistical test and study design for A/B testing in larger-scale industry experiments. Specifically, the standard approach at the industry partner was to apply a t-test for all outcomes, both continuous and binary, and to...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhou, Wenru, Kroehl, Miranda, Meier, Maxene, Kaizer, Alexander
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9982610/ https://www.ncbi.nlm.nih.gov/pubmed/36875556 http://dx.doi.org/10.1016/j.conctc.2023.101091

_version_	1784900367163588608
author	Zhou, Wenru Kroehl, Miranda Meier, Maxene Kaizer, Alexander
author_facet	Zhou, Wenru Kroehl, Miranda Meier, Maxene Kaizer, Alexander
author_sort	Zhou, Wenru
collection	PubMed
description	An industry-academic collaboration was established to evaluate the choice of statistical test and study design for A/B testing in larger-scale industry experiments. Specifically, the standard approach at the industry partner was to apply a t-test for all outcomes, both continuous and binary, and to apply naïve interim monitoring strategies that had not evaluated the potential implications on operating characteristics such as power and type I error rates. Although many papers have summarized the robustness of the t-test, its performance for the A/B testing context of large-scale proportion data, with or without interim analyses, is needed. Investigating the effect of interim analyses on the robustness of the t-test is important, because interim analyses rely on a fraction of the total sample size and one should ensure that desired properties are maintained when a t-test is implemented not just at the end of the study, but for making interim decisions. Through simulation studies, the performance of the t-test, Chi-squared test, and Chi-squared test with Yate's correction when applied to binary outcomes data is evaluated. Further, interim monitoring through a naïve approach with no correction for multiple testing versus the O'Brien-Fleming boundary are considered in designs that allow early termination for futility, difference, or both. Results indicate that the t-test achieves similar power and type I error rates for binary outcomes data with the large sample sizes used in industrial A/B tests with and without interim monitoring, and naïve interim monitoring without corrections leads to poorly performing studies.
format	Online Article Text
id	pubmed-9982610
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-99826102023-03-04 Approaches to analyzing binary data for large-scale A/B testing Zhou, Wenru Kroehl, Miranda Meier, Maxene Kaizer, Alexander Contemp Clin Trials Commun Article An industry-academic collaboration was established to evaluate the choice of statistical test and study design for A/B testing in larger-scale industry experiments. Specifically, the standard approach at the industry partner was to apply a t-test for all outcomes, both continuous and binary, and to apply naïve interim monitoring strategies that had not evaluated the potential implications on operating characteristics such as power and type I error rates. Although many papers have summarized the robustness of the t-test, its performance for the A/B testing context of large-scale proportion data, with or without interim analyses, is needed. Investigating the effect of interim analyses on the robustness of the t-test is important, because interim analyses rely on a fraction of the total sample size and one should ensure that desired properties are maintained when a t-test is implemented not just at the end of the study, but for making interim decisions. Through simulation studies, the performance of the t-test, Chi-squared test, and Chi-squared test with Yate's correction when applied to binary outcomes data is evaluated. Further, interim monitoring through a naïve approach with no correction for multiple testing versus the O'Brien-Fleming boundary are considered in designs that allow early termination for futility, difference, or both. Results indicate that the t-test achieves similar power and type I error rates for binary outcomes data with the large sample sizes used in industrial A/B tests with and without interim monitoring, and naïve interim monitoring without corrections leads to poorly performing studies. Elsevier 2023-02-16 /pmc/articles/PMC9982610/ /pubmed/36875556 http://dx.doi.org/10.1016/j.conctc.2023.101091 Text en © 2023 The Authors. Published by Elsevier Inc. https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Zhou, Wenru Kroehl, Miranda Meier, Maxene Kaizer, Alexander Approaches to analyzing binary data for large-scale A/B testing
title	Approaches to analyzing binary data for large-scale A/B testing
title_full	Approaches to analyzing binary data for large-scale A/B testing
title_fullStr	Approaches to analyzing binary data for large-scale A/B testing
title_full_unstemmed	Approaches to analyzing binary data for large-scale A/B testing
title_short	Approaches to analyzing binary data for large-scale A/B testing
title_sort	approaches to analyzing binary data for large-scale a/b testing
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9982610/ https://www.ncbi.nlm.nih.gov/pubmed/36875556 http://dx.doi.org/10.1016/j.conctc.2023.101091
work_keys_str_mv	AT zhouwenru approachestoanalyzingbinarydataforlargescaleabtesting AT kroehlmiranda approachestoanalyzingbinarydataforlargescaleabtesting AT meiermaxene approachestoanalyzingbinarydataforlargescaleabtesting AT kaizeralexander approachestoanalyzingbinarydataforlargescaleabtesting

Approaches to analyzing binary data for large-scale A/B testing

Ejemplares similares