Concurrency

Python-flavored: which mode fits the workload, the GIL, asyncio pitfalls, lock primitives, deadlock, and the idempotency toolkit that makes retries safe.

The first decision

Before writing any concurrent code, name whether the workload is I/O-bound or CPU-bound. Get this wrong and you will either add threads that cannot speed up CPU (GIL) or add processes with massive IPC overhead you did not need. Everything else follows from this decision.

Concurrency modes

Mode	Python tool	Best for	Blind spot
Synchronous	Plain function calls	CPU work, simple scripts, anything without I/O.	Single blocking call (DB query, HTTP call) stalls everything. Latency adds up.
Threads (concurrent I/O)	`threading`, `concurrent.futures.ThreadPoolExecutor`	I/O-bound work where you need familiar blocking APIs (requests, DB drivers).	GIL prevents true parallelism of Python bytecode. Race conditions on shared state.
Async (cooperative)	`asyncio`, `async`/`await`, `aiohttp`, `asyncpg`	Massive concurrent I/O (10k+ sockets). Lower overhead than threads.	One blocking call (sync DB driver, CPU loop) freezes the entire event loop.
Processes (parallel CPU)	`multiprocessing`, `concurrent.futures.ProcessPoolExecutor`	CPU-bound work that must use multiple cores.	IPC is expensive (pickling); forking semantics on Linux vs spawn on Windows/macOS.

The GIL

Fact	Implication
Only one thread executes Python bytecode at a time in CPython.	Adding threads does not speed up CPU-bound Python code. Use processes instead.
The GIL is released during I/O syscalls and in many C extensions (NumPy, hashlib).	Threaded I/O and threaded NumPy work fine. The GIL hurts Python-pure CPU loops.
Python 3.13 ships an experimental no-GIL mode (PEP 703).	Still opt-in and slower for single-threaded code; not the interview default yet.
The GIL does NOT make Python thread-safe.	Reads and writes to a dict or int may still interleave at arbitrary points. You still need locks.

Asyncio pitfalls

Pitfall	Symptom	Fix
Blocking call in async function	Event loop stalls. Latency spikes to seconds; everything is queued behind one call.	Use async-native libraries (aiohttp, asyncpg). For unavoidable sync calls, `asyncio.to_thread(func)`.
Unawaited coroutine	`RuntimeWarning: coroutine was never awaited`. The function never actually runs.	Always `await` the call, or schedule with `asyncio.create_task(...)` if fire-and-forget.
Fire-and-forget task never completes	Task is garbage-collected before it runs. Silent failure.	Hold a reference: `task = asyncio.create_task(coro); tasks.add(task)`. Remove on done.
Mixing sync and async by accident	`asyncio.run(coro)` called from inside a running loop → RuntimeError.	Within async code, `await coro`. `asyncio.run` is only a top-level entrypoint.
Unbounded concurrency	`asyncio.gather(*[fetch(u) for u in 10000_urls])` DDoSes the target.	Bound with `asyncio.Semaphore(N)` or process in batches.

Synchronization primitives

Primitive	Purpose	Python API	Pitfall
Mutex / Lock	Mutual exclusion — only one thread in the critical section.	`threading.Lock`, `asyncio.Lock`	Never held across await/IO if you can avoid it. Prefer `with lock:` to guarantee release.
RLock (re-entrant)	Same thread can acquire multiple times without deadlocking itself.	`threading.RLock`	Masks bad design; if you need re-entrance, your call graph may be tangled.
Semaphore	Cap concurrency to N (connection pools, rate-limited clients).	`threading.Semaphore(N)`, `asyncio.Semaphore(N)`	Leaked acquire without release drains the pool over time.
Condition variable	Wait for a predicate to become true; notifier wakes waiters.	`threading.Condition`, `asyncio.Condition`	Spurious wakeups — always re-check the predicate in a while-loop.
Event	One-shot flag that one side sets, many can wait on.	`threading.Event`, `asyncio.Event`	No counter — "set" is idempotent. Use a Semaphore if you need counting.
Queue	Thread/task-safe FIFO for producer-consumer.	`queue.Queue`, `asyncio.Queue`	Unbounded queues leak memory under sustained producer > consumer.

Deadlock

Coffman's four conditions — all four must hold for deadlock. Break any one and you cannot deadlock.

Mutual exclusion

At least one resource is non-shareable (only one holder at a time).

Hold and wait

A thread holds one resource while waiting for another.

No preemption

Resources cannot be forcibly taken away from the holder.

Circular wait

A cycle of threads, each waiting for a resource held by the next.

Breaking deadlock

Global lock ordering: acquire locks in a total order across the whole system.
Try-acquire with timeout: abandon and retry on failure to take the second lock.
Reduce the hold: never make a blocking call while holding a lock.
Single-writer design: one thread owns mutations; others send messages.

Locking strategies

Strategy	Flow	Best for	Cost
Pessimistic locking	SELECT ... FOR UPDATE → modify → commit. Lock held for txn duration.	High contention, short critical sections (money transfers).	Blocks other readers/writers; risk of deadlock if lock order is inconsistent.
Optimistic locking	Read row with version; write UPDATE ... WHERE version=? SET version=version+1. Retry on zero-row-affected.	Low contention (most rows are not concurrently edited).	Retries pile up under contention; livelock if conflict rate is high.
Lease-based	Take a time-bounded lease; holder must renew; expires automatically.	Distributed leader election, cache warmers, long-running work.	Clock skew between nodes; two leaders possible during lease handoff.

Idempotency toolkit

Technique	Description	Example
Idempotency key	Client supplies a unique key per logical operation; server stores result and returns it on retry.	Stripe: `Idempotency-Key: <uuid>` header on POST /charges.
Natural key uniqueness	Use a business-meaningful unique constraint; second INSERT is a no-op.	`INSERT ... ON CONFLICT (email) DO NOTHING` — double-signup becomes safe.
Compare-and-set	Write only if current state matches expected; used in optimistic locking and DynamoDB conditional writes.	`UPDATE row SET status='paid', version=v+1 WHERE id=? AND version=?`
Outbox pattern	Write the event to an outbox table in the same DB txn as the business write; separate publisher drains it.	Guarantees "event published iff DB write committed" without distributed transactions.

"Exactly-once" is a myth

'Exactly-once' end-to-end requires the consumer side-effect and the broker ack to be one atomic transaction. For arbitrary side-effects (HTTP calls, emails, file writes) this is impossible. What you actually build is 'at-least-once delivery + idempotent consumer', which is effectively-once. When an interviewer asks how you'd get exactly-once, they want to hear this answer, not a confident 'use Kafka EOS'.