Concurrency

Python-flavored: which mode fits the workload, the GIL, asyncio pitfalls, lock primitives, deadlock, and the idempotency toolkit that makes retries safe.

The first decision

Before writing any concurrent code, name whether the workload is I/O-bound or CPU-bound. Get this wrong and you will either add threads that cannot speed up CPU (GIL) or add processes with massive IPC overhead you did not need. Everything else follows from this decision.

Concurrency modes

ModePython toolBest forBlind spot
Synchronous Plain function calls CPU work, simple scripts, anything without I/O. Single blocking call (DB query, HTTP call) stalls everything. Latency adds up.
Threads (concurrent I/O) `threading`, `concurrent.futures.ThreadPoolExecutor` I/O-bound work where you need familiar blocking APIs (requests, DB drivers). GIL prevents true parallelism of Python bytecode. Race conditions on shared state.
Async (cooperative) `asyncio`, `async`/`await`, `aiohttp`, `asyncpg` Massive concurrent I/O (10k+ sockets). Lower overhead than threads. One blocking call (sync DB driver, CPU loop) freezes the entire event loop.
Processes (parallel CPU) `multiprocessing`, `concurrent.futures.ProcessPoolExecutor` CPU-bound work that must use multiple cores. IPC is expensive (pickling); forking semantics on Linux vs spawn on Windows/macOS.

The GIL

FactImplication
Only one thread executes Python bytecode at a time in CPython. Adding threads does not speed up CPU-bound Python code. Use processes instead.
The GIL is released during I/O syscalls and in many C extensions (NumPy, hashlib). Threaded I/O and threaded NumPy work fine. The GIL hurts Python-pure CPU loops.
Python 3.13 ships an experimental no-GIL mode (PEP 703). Still opt-in and slower for single-threaded code; not the interview default yet.
The GIL does NOT make Python thread-safe. Reads and writes to a dict or int may still interleave at arbitrary points. You still need locks.

Asyncio pitfalls

PitfallSymptomFix
Blocking call in async function Event loop stalls. Latency spikes to seconds; everything is queued behind one call. Use async-native libraries (aiohttp, asyncpg). For unavoidable sync calls, `asyncio.to_thread(func)`.
Unawaited coroutine `RuntimeWarning: coroutine was never awaited`. The function never actually runs. Always `await` the call, or schedule with `asyncio.create_task(...)` if fire-and-forget.
Fire-and-forget task never completes Task is garbage-collected before it runs. Silent failure. Hold a reference: `task = asyncio.create_task(coro); tasks.add(task)`. Remove on done.
Mixing sync and async by accident `asyncio.run(coro)` called from inside a running loop → RuntimeError. Within async code, `await coro`. `asyncio.run` is only a top-level entrypoint.
Unbounded concurrency `asyncio.gather(*[fetch(u) for u in 10000_urls])` DDoSes the target. Bound with `asyncio.Semaphore(N)` or process in batches.

Synchronization primitives

PrimitivePurposePython APIPitfall
Mutex / Lock Mutual exclusion — only one thread in the critical section. `threading.Lock`, `asyncio.Lock` Never held across await/IO if you can avoid it. Prefer `with lock:` to guarantee release.
RLock (re-entrant) Same thread can acquire multiple times without deadlocking itself. `threading.RLock` Masks bad design; if you need re-entrance, your call graph may be tangled.
Semaphore Cap concurrency to N (connection pools, rate-limited clients). `threading.Semaphore(N)`, `asyncio.Semaphore(N)` Leaked acquire without release drains the pool over time.
Condition variable Wait for a predicate to become true; notifier wakes waiters. `threading.Condition`, `asyncio.Condition` Spurious wakeups — always re-check the predicate in a while-loop.
Event One-shot flag that one side sets, many can wait on. `threading.Event`, `asyncio.Event` No counter — "set" is idempotent. Use a Semaphore if you need counting.
Queue Thread/task-safe FIFO for producer-consumer. `queue.Queue`, `asyncio.Queue` Unbounded queues leak memory under sustained producer > consumer.

Deadlock

Coffman's four conditions — all four must hold for deadlock. Break any one and you cannot deadlock.

Mutual exclusion

At least one resource is non-shareable (only one holder at a time).

Hold and wait

A thread holds one resource while waiting for another.

No preemption

Resources cannot be forcibly taken away from the holder.

Circular wait

A cycle of threads, each waiting for a resource held by the next.

Breaking deadlock

  • Global lock ordering: acquire locks in a total order across the whole system.
  • Try-acquire with timeout: abandon and retry on failure to take the second lock.
  • Reduce the hold: never make a blocking call while holding a lock.
  • Single-writer design: one thread owns mutations; others send messages.

Locking strategies

StrategyFlowBest forCost
Pessimistic locking SELECT ... FOR UPDATE → modify → commit. Lock held for txn duration. High contention, short critical sections (money transfers). Blocks other readers/writers; risk of deadlock if lock order is inconsistent.
Optimistic locking Read row with version; write UPDATE ... WHERE version=? SET version=version+1. Retry on zero-row-affected. Low contention (most rows are not concurrently edited). Retries pile up under contention; livelock if conflict rate is high.
Lease-based Take a time-bounded lease; holder must renew; expires automatically. Distributed leader election, cache warmers, long-running work. Clock skew between nodes; two leaders possible during lease handoff.

Idempotency toolkit

TechniqueDescriptionExample
Idempotency key Client supplies a unique key per logical operation; server stores result and returns it on retry. Stripe: `Idempotency-Key: <uuid>` header on POST /charges.
Natural key uniqueness Use a business-meaningful unique constraint; second INSERT is a no-op. `INSERT ... ON CONFLICT (email) DO NOTHING` — double-signup becomes safe.
Compare-and-set Write only if current state matches expected; used in optimistic locking and DynamoDB conditional writes. `UPDATE row SET status='paid', version=v+1 WHERE id=? AND version=?`
Outbox pattern Write the event to an outbox table in the same DB txn as the business write; separate publisher drains it. Guarantees "event published iff DB write committed" without distributed transactions.

"Exactly-once" is a myth

'Exactly-once' end-to-end requires the consumer side-effect and the broker ack to be one atomic transaction. For arbitrary side-effects (HTTP calls, emails, file writes) this is impossible. What you actually build is 'at-least-once delivery + idempotent consumer', which is effectively-once. When an interviewer asks how you'd get exactly-once, they want to hear this answer, not a confident 'use Kafka EOS'.