Mitigate JavaScript Flaky Unit Tests

Flaky tests are the constant accompanying problem of autotests. However, unlike UI end-to-end browser tests, unit tests are less exposed to flakiness, but the issue still exists.

The approaches to mitigating flaky tests for E2E and unit tests are quite different.

The main differences between these tests lie in their very definition:

A unit test is a type of software test that verifies the functionality of a specific, isolated piece of code — typically a function or method — by checking its output against expected results.
An end-to-end (E2E) test is a type of software test that verifies the complete workflow of an application, from start to finish, to ensure all (or most of them) components — frontend, backend, database, and third-party services — function correctly together.

Next, I will consider ways to reduce the flakiness of unit tests on JavaScript, which has nothing in common with integration testing or complicated API and E2E tests that require execution of HTTP requests and browsers to run.

1. Rewrite your flaky tests

The most dominant reason of flaky tests is concurrency-related issues (e.g., async wait, race conditions, atomicity violations, or deadlocks) [1], [2]. And — this may sound silly and obvious — but rewriting tests is the most common and effective solution to eliminate the problem of these kinds of flaky tests [2], [3].

The following points outline the approaches to be followed when rewriting unit tests.

2. Maintain maximum test independence

Unit tests will be robust and reliable if they are self-contained and do not depend on the execution or results of other tests. Test independence is achieved by maintaining several aspects:

Avoid sharing state across tests;
Keep test data isolated;
Avoid order-dependent tests.

For the code example below, I will use Vitest as a testing framework — it is not the most widely used in JavaScript (it is Jest now), but it is one of the fastest-growing ones [4].

The «state» may refer to variables or in-memory objects that can be accessed simultaneously by different tests. The easiest way to avoid this is to avoid having global variables in tests.

❌ Bad example

import { test, expect } from "vitest";

let counter = 0;

test("should increment counter", () => {
  counter += 1;
  expect(counter).toBe(1);
});

test("should decrement counter", () => {
  counter -= 1;
  expect(counter).toBe(0); // ❌ Fails if tests run in a different order due to shared variable
});

✅ Good example

import { test, expect } from "vitest";

test("should increment counter", () => {
  let counter = 0;
  counter += 1;
  expect(counter).toBe(1);
});

test("should decrement counter", () => {
  let counter = 0;
  counter -= 1;
  expect(counter).toBe(-1);
});

The example above keeps test variables inside each test function.

Keep test data isolated

As with the shared state, any resources, test files, and common variables/constants imported into tests should not be simultaneously accessed by different tests. Each test should create the test data entity only for itself, and it should not be reusable for other tests.

If test data is created for a test, it should be cleaned up after the test ends. Otherwise, if the test does not perform proper cleanup, it may fail when run more than once on the same machine [1].

If common variables/constants are imported as test data into the test file, it should be copied for each test. For objects, this can be done by deep cloning to prevent modifications of an original object.

import { test, expect } from "vitest";
import _ from "lodash";

const original = {
  player: { name: "Messi", bib: 10 },
};

test("should not mutate the original object", () => {
  const originalCopy = _.cloneDeep(original);

  originalCopy.player.bib = 30;

  expect(original.player.bib).toBe(10);
});

In the example above, a copy of the data is created in the test using _.cloneDeep() method, and subsequent operations are performed under the copied data while the original data stays untouched.

Avoid order-dependent tests

There are two types of order-dependency tests:

First, «victims» pass when run in isolation but fail when run after another test(s), known as «polluter(s)» [5]. A polluter «pollutes/spoils» the state shared between the two tests (the first test in the bad example of «Avoid sharing state across tests» paragraph may be considered as a polluter).
Second, «brittle» tests that fail when run in isolation but pass when run after another test(s) called a «state-setter(s)» [5]. The state-setters set up the initial state required for the brittle to pass.

Both types of described tests are legitimate to be present in API or E2E test sets due to the inherent complexity of the necessary order of HTTP requests or sequences of clicks in a browser. However, order dependency in unit tests is unacceptable — the necessary manipulations to bring the state to the desired condition should be carried out through the test fixtures or setup hooks, not through other tests or test steps.

3. Use hooks for preconditions and postconditions

The best way to bring the state of the testing function to the desired condition is by using beforeEach and/or beforeAll hooks (and afterEach or afterAll to teardown changes). These hooks will allow to keep order independency between tests.

❌ Bad example

import { expect, test } from 'vitest';

let dataBase: Record<string, string>[] = [];

test('should initialize data', () => {
  dataBase.push({ player: 'Messi' });
  expect(dataBase.length).toBe(1);
});

test('should get data', () => {
  expect(dataBase[0].player).toBe('Messi'); // ❌ Fails if run in isolation or in random order
});

✅ Good example

import { beforeAll, expect, test } from 'vitest';

let dataBase: Record<string, string>[] = [];

beforeAll(() => {
  dataBase.push({ player: 'Messi' });
});

test('should get data', () => {
  expect(dataBase[0].player).toBe('Messi');
});

The example above works in any order or when run individually because it does not rely on another test to set up the state.

4. Be careful with the precision of checks

If you check one value against another, be tolerant of «inaccurate» checks and rounding checking values. Too precise values may cause flakiness, especially geographical coordinates and time.

❌ Bad example

import { test, expect } from "vitest";
import { calculateDistanceFunction } from "./calculate-distance-function";

test("should calculate distance", () => {
  const result = calculateDistanceFunction(
    [20.4489, 44.7866],
    [19.8335, 45.2671],
  );
  expect(result).toBe(72.06735); // ❌ Too precise, may cause flakiness due to floating-point precision
});

✅ Good example

import { test, expect } from "vitest";
import { calculateDistanceFunction } from "./calculate-distance-function";

test("should calculate distance", () => {
  const result = calculateDistanceFunction(
    [20.4489, 44.7866],
    [19.8335, 45.2671],
  );
  expect(result).toBeCloseTo(72, 0);
});

The example above uses a less strict comparison (check for km instead of cm precision) and toBeCloseTo as a more suitable assertion.

5. Brace yourself for time checks

Any autotest around time is potentially flaky. The problems lie in many areas: time zones, time accuracy (see «Be careful with the precision of checks» paragraph), test environment settings, time handling in the programming language itself (for example, JavaScript engines or test frameworks), computation speed, etc.

❌ Bad example

import { test, expect } from "vitest";
import { delayedFunction } from "./delayed-function";

test("should call callback after 100ms", async () => {
  const start = Date.now();

  await new Promise((resolve) => {
    delayedFunction(() => {
      const end = Date.now();

      expect(end - start).toBe(100); // ❌ Too strict, while execution time may vary

      resolve();
    });
  });
});

✅ Good example

import { test, expect } from "vitest";
import { delayedFunction } from "./delayed-function";

test("should call callback after approximately 100ms", async () => {
  const start = Date.now();

  await new Promise((resolve) => {
    delayedFunction(() => {
      const end = Date.now();

      expect(end - start).toBeGreaterThanOrEqual(100);
      expect(end - start).toBeLessThan(150);

      resolve();
    });
  });
});

The example above allows a small margin for a calculated time check.

Unfortunately, in complex applications, you cannot do without time tests (simply because of business logic). It is enough to be aware that you have to be thoughtful when writing unit tests for functions based on time: use fake timers [6], use UTC instead of local time, and check seconds or milliseconds instead of microseconds.

Timeouts inside tests may also be classified as a potential problem with time. Timeouts and sleep calls are usually used to fix flaky tests, but they only decrease the chance of a flaky failure: running tests on diﬀerent machines may make the sleep calls time out and trigger the flaky failures again [2].

6. Test your tests

After the flaky test problem is solved, to make sure it does not reproduce on another test, it is a good practice to retest a fix on the whole test set.

To do this, you need to check the tests for order dependence by running them in random order or «shuffled».

Vitest allows shuffling on the test-file level: sequence.shuffle.files

npx vitest run --sequence.shuffle.files

And shuffling of tests inside each test-file: sequence.shuffle.tests

npx vitest run --sequence.shuffle.tests

Combination of these CLI options makes a test run completely random:

npx vitest run --sequence.shuffle.files --sequence.shuffle.tests

You can set your tests to always run in random order in CI. But in case of a failure, you may need to reproduce the order in which they were run, and this will not be so easy. So, to simplify debugging failed tests on a production scale, it may be better to run tests in CI in a predictable order.

Fully independent tests will allow them to be safely run in parallel, which will increase the speed of testing.

This article does not cover mocks (although fake timers are classified as mocks). On the one hand, this is a whole topic that needs more in-depth discussion. On the other hand, most of the above principles related to test independence can also be applied to mocks, such as not keeping a global context for mocks, not sharing the same mocks between tests, creating and deleting mocks inside hooks, etc.

References:

Negar Hashemi, Amjed Tahir, and Shawn Rasheed, “An Empirical Study of Flaky Tests in JavaScript,” in 38th IEEE International Conference on Software Maintenance and Evolution (ICSME), 2022. https://doi.org/10.48550/arXiv.2207.01047
Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov, “An empirical analysis of flaky tests,” in Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2014. https://doi.org/10.1145/2635868.2635920
Wing Lam, Kıvanc Muslu, Hitesh Sajnani, and Suresh Thummalapenta, “A study on the lifecycle of flaky tests,” in Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 2020. https://doi.org/10.1145/3377811.3381749
State of JavaScript 2024: Testing
Negar Hashemi, Amjed Tahir, Shawn Rasheed, August Shi, and Rachel Blagojevic, “Detecting and Evaluating Order-Dependent Flaky Tests in JavaScript,” in 18th IEEE International Conference on Software Testing, Verification and Validation (ICST), 2025. https://doi.org/10.48550/arXiv.2501.12680
Testing Library: Using Fake Timers

Copy @ Medium

1. Rewrite your flaky tests

2. Maintain maximum test independence

Avoid sharing state across tests

Keep test data isolated

Avoid order-dependent tests

3. Use hooks for preconditions and postconditions

4. Be careful with the precision of checks

5. Brace yourself for time checks

6. Test your tests