Cargando…

Probabilistic and Systematic Coverage of Consecutive Test-Method Pairs for Detecting Order-Dependent Flaky Tests

Software developers frequently check their code changes by running a set of tests against their code. Tests that can nondeterministically pass or fail when run on the same code version are called flaky tests. These tests are a major problem because they can mislead developers to debug their recent c...

Descripción completa

Detalles Bibliográficos
Autores principales: Wei, Anjiang, Yi, Pu, Xie, Tao, Marinov, Darko, Lam, Wing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7979169/
http://dx.doi.org/10.1007/978-3-030-72016-2_15
Descripción
Sumario:Software developers frequently check their code changes by running a set of tests against their code. Tests that can nondeterministically pass or fail when run on the same code version are called flaky tests. These tests are a major problem because they can mislead developers to debug their recent code changes when the failures are unrelated to these changes. One prominent category of flaky tests is order-dependent (OD) tests, which can deterministically pass or fail depending on the order in which the set of tests are run. By detecting OD tests in advance, developers can fix these tests before they change their code. Due to the high cost required to explore all possible orders (n! permutations for n tests), prior work has developed tools that randomize orders to detect OD tests. Experiments have shown that randomization can detect many OD tests, and that most OD tests depend on just one other test to fail. However, there was no analysis of the probability that randomized orders detect OD tests. In this paper, we present the first such analysis and also present a simple change for sampling random test orders to increase the probability. We finally present a novel algorithm to systematically explore all consecutive pairs of tests, guaranteeing to detect all OD tests that depend on one other test, while running substantially fewer orders and tests than simply running all test pairs.