Testing

The test pyramid, the five test doubles everyone conflates, pytest idioms to know by heart, and the real causes of flakes.

What interviewers grade

Juniors say "I write unit tests." Seniors can distinguish unit / integration / e2e, pick the right test double for the seam, and explain why a specific test flakes. If you can diagnose a flake in two sentences, you signal real production experience.

The test pyramid

LevelScopeSpeedFragilityRole
Unit One function / class; no I/O, no external services. Milliseconds; run thousands per second. Low — fail only when the code truly broke. 80% of tests. Cover logic branches, edge cases, boundaries.
Integration Multiple units together with real collaborators (DB, cache, queue). Tens to hundreds of ms each. Medium — schema drift, fixture data, timing. 15%. Cover wiring: migration ran, ORM query works, txn boundaries correct.
End-to-end Full system through the public entrypoint (HTTP, CLI). Seconds to tens of seconds. High — any cross-cutting change breaks them. 5%. One happy path per critical user flow; more is a tax on velocity.

Test doubles

Five distinct things that get called "mock" in casual conversation. Knowing the difference is a senior tell.

DoubleWhat it isUse forAvoid
Dummy Placeholder that satisfies a signature; never called. Filling a required constructor parameter that the test path does not exercise. Using a dummy when the test actually invokes the collaborator — silent pass.
Stub Returns canned responses to configured calls. No assertions. Testing code that queries something — you control what comes back. Re-implementing the real dependency; stubs should be dumb.
Spy Records calls made to it; test later asserts what happened. Verifying a logger or analytics call was emitted correctly. Over-asserting on internal calls — couples tests to implementation.
Mock Pre-programmed with expectations; verifies calls during or after the test. Verifying behavior at a seam, e.g., "payment service was called once with these args." Mocking objects you own. Mock at the boundary of your system, not inside it.
Fake Working implementation, but simplified (in-memory DB, in-process queue). Integration-ish tests without the real dependency — fast and deterministic. Divergence from real behavior. Test the real thing at least once in CI.

pytest idioms

Parametrize

— Run the same test body against a table of inputs.
@pytest.mark.parametrize("n,expected", [(0, 0), (1, 1), (5, 120)])
def test_factorial(n, expected):
assert factorial(n) == expected

Fixture

— Shared setup/teardown with arbitrary scope.
@pytest.fixture
def db():
conn = make_conn()
yield conn
conn.close()

Fixture scopes

— Amortize expensive setup across many tests.
@pytest.fixture(scope="session") # or module, class, function
def app():
return create_app()

monkeypatch

— Patch attribute, env var, or dict entry for one test.
def test_uses_env(monkeypatch):
monkeypatch.setenv("FEATURE_X", "on")
assert feature_enabled("x")

tmp_path

— Per-test temp directory; automatically cleaned up.
def test_writes_file(tmp_path):
p = tmp_path / "out.txt"
write_report(p)
assert p.read_text() == "ok"

raises

— Assert a block raises an exception.
def test_rejects_negative():
with pytest.raises(ValueError, match="must be >= 0"):
withdraw(-1)

Marks + filtering

— Group tests by speed, tier, or feature; select at CLI.
@pytest.mark.slow
def test_big_scan(): ...
# Run fast tests only:
# pytest -m "not slow"

Flake causes

CauseSymptomFix
Time-dependent assertions Test passes locally, fails on slow CI. "Expected 1.00, got 1.02." Freeze time with `freezegun` or inject a clock. Compare with tolerance for timings.
Shared state between tests Works alone; fails when another test runs first. Order-dependent. Reset global/singleton state in teardown. Prefer pure functions and fresh fixtures.
Network / real external service Fails when the network hiccups or the service is down for maintenance. Fake / record-replay (VCR) for most tests. Real network only in a tagged slow suite.
Concurrency / races Passes 99 times, fails 1 time. Harder on more cores. Deterministic scheduling; inject a fake event loop; kill sleeps with explicit sync points.
Unseeded randomness A rare input pattern fails once per week in CI. Fix the seed in conftest.py. Record the failing input on assertion failure.
Filesystem / tmp collisions Parallel test runs trip over each other on disk. Use `tmp_path` (per-test dir). Never hard-code `/tmp/foo`.
Leaky test data Today's test fails because yesterday's test left a row behind. Rollback transaction per test; or truncate tables in teardown. Factories over fixtures for volume.

Senior rules of thumb

TDD — when to reach for it

TDD buys you fast feedback, a design pressure toward testable seams, and a safety net during refactor. It is not a faith — if a test does not pay rent (catching real regressions), it is waste. Skip TDD when the shape is obvious; reach for it when the logic is gnarly or the cost of a bug is high.