Question 3
Testing asynchronous & concurrent code
How do you reliably test async/await code and concurrency-heavy features to avoid flakiness and race conditions?
Answer outline
Flaky async tests almost always come from nondeterminism — real timers, real concurrency, or shared mutable state. The fix is to eliminate it rather than try to wait long enough for correctness.
There are five techniques that cover the majority of cases:
- 1.Inject a controllable clock — replace
Task.sleepand real timers with aTestClockorImmediateClockso the test controls time instead of waiting for it. - 2.Use
asynctest functions with directawait— preferfunc test() async throwsoverXCTestExpectation; always await the exact work being tested so assertions run after completion, not during it. - 3.Control task lifetimes — avoid fire-and-forget
Task {}in production code; usewithTaskGroupso callers (and tests) canawaitall spawned work to guaranteed completion. - 4.Isolate shared mutable state — use
actor-isolated types or thread-safe fakes; repeat the scenario 1 000× and run with Thread Sanitizer to surface data races early. - 5.Inject every async dependency — networking, databases, queues, and clocks should all be swappable for deterministic fakes, letting tests simulate failures and force ordering.
Principles
- Inject a
TestClockinstead of sleeping — your test controls time, not the scheduler. - Prefer
asynctest functions with directawait; avoid mixingXCTestExpectationwith structured concurrency. - Use Thread Sanitizer and repeat scenarios to surface race conditions reliably.
- Test cancellation explicitly — cancel a task and assert the system under test respected it.
- One concern per async test — every extra layer multiplies race surfaces.
A RetryHandler waits 2 seconds between attempts. Injecting a no-op TestClock means the test runs instantly — no real waiting:
Clock injection: retry with delay
protocol ClockType {
func sleep(for duration: Duration) async throws
}
struct SystemClock: ClockType {
func sleep(for duration: Duration) async throws {
try await Task.sleep(for: duration)
}
}
struct TestClock: ClockType {
func sleep(for duration: Duration) async throws { /* no-op */ }
}
// Production type
struct RetryHandler {
let clock: any ClockType
func fetch(maxAttempts: Int, work: () async throws -> Data) async throws -> Data {
var lastError: Error?
for attempt in 1...maxAttempts {
do {
return try await work()
} catch {
lastError = error
if attempt < maxAttempts {
try await clock.sleep(for: .seconds(2)) // skipped in tests
}
}
}
throw lastError!
}
}
// Test — completes immediately despite the 2-second retry delays
func testRetriesOnFailure() async throws {
var callCount = 0
let sut = RetryHandler(clock: TestClock())
_ = try await sut.fetch(maxAttempts: 3) {
callCount += 1
if callCount < 3 { throw URLError(.notConnectedToInternet) }
return Data()
}
XCTAssertEqual(callCount, 3)
}
Cancel a task and assert the SUT observed it — don’t assume cleanup happens automatically:
Testing cancellation explicitly
func testCancellationIsRespected() async {
let task = Task {
await sut.longRunningWork()
}
task.cancel()
await Task.yield()
XCTAssertTrue(sut.didCancel)
}