System Design Lite

The 30-minute whiteboard scenarios that appear in nearly every senior SWE loop. Each one names the invariants, not a vendor product.

How to use this page

Cover the Approach section and see if you can derive it from the Problem statement and the Clarifying Questions. The Follow-ups are what interviewers probe once you have an initial answer — be ready for all of them before you consider a scenario "done".

LRU cache

Time O(1) get / put · Space O(capacity)

Problem

Cache with a fixed capacity that evicts the least-recently-used entry on insert when full. `get` and `put` must be O(1).

Clarify first

Is capacity fixed at construction or dynamic?
Is access (get) thread-safe? If yes, coarse lock or lock-free?
Are we optimizing for hit latency (in-memory) or hit rate (tiered cache)?

Approach

Hash map (key → node) gives O(1) lookup. Doubly-linked list keeps entries in recency order — move-to-head on access, drop-from-tail on eviction. The two structures share the same node objects so both pointers move in O(1).

Reference implementation

class Node:
    __slots__ = ('key', 'val', 'prev', 'next')
    def __init__(self, k, v):
        self.key, self.val, self.prev, self.next = k, v, None, None

class LRU:
    def __init__(self, cap: int):
        self.cap = cap
        self.map: dict[int, Node] = {}
        # Sentinel head/tail — removes null-checks
        self.head, self.tail = Node(0, 0), Node(0, 0)
        self.head.next, self.tail.prev = self.tail, self.head

    def _remove(self, n: Node) -> None:
        n.prev.next, n.next.prev = n.next, n.prev

    def _add_to_front(self, n: Node) -> None:
        n.prev, n.next = self.head, self.head.next
        self.head.next.prev = n
        self.head.next = n

    def get(self, key: int) -> int:
        n = self.map.get(key)
        if not n:
            return -1
        self._remove(n); self._add_to_front(n)
        return n.val

    def put(self, key: int, val: int) -> None:
        if key in self.map:
            self._remove(self.map[key])
        n = Node(key, val)
        self.map[key] = n
        self._add_to_front(n)
        if len(self.map) > self.cap:
            lru = self.tail.prev
            self._remove(lru)
            del self.map[lru.key]

Follow-up questions

? Make it thread-safe. What concurrency primitive is cheapest — a single lock, striped locks, or lock-free?
? Implement LFU instead. Why is it harder to get O(1)?
? How would you extend this to a distributed cache?

LFU cache (vs LRU)

Time O(1) get / put · Space O(capacity)

Problem

Evict the least-frequently-used entry. On tie, break by recency.

Clarify first

Is this a pure interview question or are we benchmarking? LFU hit-rate is often worse than LRU on realistic workloads
How often do items age out? A pure frequency count never decays, so old popular items can stick forever

Approach

Hash map key → node, plus a hash map from frequency → doubly-linked list of nodes at that frequency. Track the current minimum frequency; on access, bump the node to the next frequency list. Eviction pops from the head of the min-frequency list.

Follow-up questions

? What is "frequency decay" and why do production LFU variants (TinyLFU, W-TinyLFU) use it?
? Why do many systems use LRU even when LFU hit rates are technically higher?

Rate limiter

Time O(1) per request · Space O(keys × window_granularity)

Problem

Reject requests that exceed a quota (e.g., 100 req / minute / user). Should work across many server instances.

Clarify first

Per-user, per-IP, per-API-key, or per-tenant?
Soft (429 + retry-after) or hard (drop the TCP connection)?
Is approximate OK, or do we need exact? Distributed exact-counting is expensive

Approach

Pick an algorithm (see table below). Store counters in a shared fast store (Redis with atomic INCR + EXPIRE is the canonical implementation). For distributed accuracy, use token bucket with atomic Lua scripts or sliding-window-counter for memory efficiency.

Rate limiter algorithms

Criterion	Algorithm	Behavior
Fixed window	Counter per window (e.g., minute)	Edge case: burst at window boundary can be 2× the intended limit
Sliding log	Store a timestamp per request in a sorted set	Exact, but memory scales with request volume — expensive at scale
Sliding window counter	Two adjacent fixed-window counts, weighted by time-in-current-window	Good accuracy with O(1) memory per key — the pragmatic default
Token bucket	Bucket refills at a rate; each request consumes a token	Allows short bursts up to bucket size — user-friendly for interactive APIs
Leaky bucket	Requests queue; drained at a fixed rate	Smooths output but adds latency — fits traffic shaping more than rate limiting

Rule of thumb: Start with sliding-window counter for general API rate limiting. Use token bucket when short bursts are a feature. Avoid fixed window for anything public-facing.

Follow-up questions

? What happens when Redis is unreachable? Fail-open or fail-closed?
? How do you handle clock skew across instances?
? What if one user generates 99% of traffic? (Hot-key problem)

Bloom filter

Time O(k) per op · Space O(m) bits — typically ~9.6 bits per element for 1% FPR

Problem

Probabilistic set membership — "is X probably in the set?" with O(1) queries and ~1 byte per element. Never false-negatives; may false-positive at a configurable rate.

Clarify first

What false-positive rate is acceptable? 1%? 0.01%?
Do we need to remove elements? Standard Bloom filters cannot — you need a Counting Bloom Filter

Approach

Bit array of size m. On insert, hash the element with k independent hash functions and set those k bits. On query, if any of the k bits is 0 → definitely not in set; if all are 1 → probably in set. Tune m and k for target false-positive rate: m = -n·ln(p) / (ln 2)² for n elements and false-positive rate p.

Follow-up questions

? Where is this used in real systems?
? Counting Bloom Filter vs Cuckoo Filter — when does each win?
? Why do LSM-tree databases use Bloom filters per SSTable?

Consistent hashing

Time O(log N) lookup with a sorted ring + binary search · Space O(N × virtual_count)

Problem

Distribute keys across N nodes. When a node is added or removed, move as few keys as possible — ideally ~1/N of them.

Clarify first

How many nodes and how many keys? (Affects load balance)
Is node membership static or changing frequently?
Do we need replication? Usually yes — store at the next K nodes on the ring

Approach

Hash each node onto a ring (0 to 2^32 - 1). Hash each key onto the same ring; the key belongs to the first node clockwise. Adding / removing a node only moves the keys in that node's ring segment. Use virtual nodes (many ring positions per real node, e.g., 150×) to smooth out load imbalance.

Follow-up questions

? What if the hash ring has one node with 10× the keys? (Virtual nodes fix this)
? Why does mod-N hashing (`hash(key) % N`) fall apart on rescale?
? Rendezvous hashing as an alternative — when does it win?

Distributed ID generation

Time O(1) per ID · Space O(1) state per worker

Problem

Generate unique IDs across many machines without coordination. Which ID shape fits the use case?

Clarify first

Must IDs be sortable by creation time?
Is 128-bit OK, or do we need 64-bit for DB index efficiency?
Must we avoid leaking cardinality (guessable IDs are an anti-pattern for public resources)?

Approach

Choose by shape, not by product name: • **Random 128-bit (UUID v4)** — trivial to generate, no coordination, not sortable. Bad for B-tree primary keys (random inserts shred the index). • **Time-ordered 128-bit (UUID v7 / ULID)** — first 48 bits are a millisecond timestamp, rest is random. Sortable, index-friendly, no coordination. • **Time-ordered 64-bit with machine id (Snowflake-shape)** — timestamp + datacenter/worker id + sequence. Requires a unique worker-id assignment process. Fits in a bigint PK.

Follow-up questions

? How do you assign worker IDs at scale without a coordinator?
? What happens to Snowflake-shape IDs when the clock goes backwards (NTP correction, leap smear)?
? Why would you NOT use a random-UUID as the primary key of a large InnoDB table?

Messaging primitives — pick the right shape

Time Producer O(1) + broker round-trip · Space Retention-dependent

Problem

An async workflow needs to hand off work between services. Which messaging shape fits?

Clarify first

Exactly once? At least once? At most once? (Exactly-once is almost always a lie — aim for at-least-once + idempotent consumers)
Do consumers need to replay old messages?
One consumer per message, or fan-out?

Approach

Three shapes, anchored on semantics (products in parens are examples, not prescriptions): • **Append-only log** — ordered, replayable, consumer-tracks-offset. Multiple independent consumers read at their own pace. Good for event sourcing, analytics fan-out (Kafka, Kinesis, Pulsar). • **Work queue** — each message goes to one consumer; ack / nack on completion; visibility timeout protects against dead workers. Good for tasks (RabbitMQ work queues, SQS, Beanstalkd). • **Pub/sub broadcast** — every subscriber gets every message; no durable offset. Good for live notifications (Redis Pub/Sub, SNS). Do not assume the named product behaves exactly like the shape — features drift. Read the docs of whichever one you actually pick.

Follow-up questions

? How do you achieve idempotent consumers on top of at-least-once delivery?
? Why is "exactly-once" usually misleading marketing?
? When do you need a dead-letter queue, and how do you configure it?

Cache strategies

Time O(1) cached read; O(db) on miss · Space Bounded by cache capacity

Problem

A hot read path is hitting the database too hard. Where and how do you cache?

Clarify first

Is the underlying data changing, and how stale can the cache be?
Is write volume high? (Write-through has a write tax; cache-aside does not)
Can two consumers read the same key concurrently? (Cache stampede)

Approach

**Cache-aside** (most common): app checks cache → miss → reads DB → writes cache. Simple, but subject to stampede and needs TTLs. **Write-through**: writes go to cache AND DB atomically; reads always hit cache. No staleness, but write path pays the cache cost. **Write-back**: writes go to cache; DB is updated asynchronously. Fastest writes; worst durability. **Write-around**: writes skip cache and go straight to DB; only reads populate cache. Good when writes are seldom re-read soon after.

Follow-up questions

? What is a cache stampede and how do single-flight / probabilistic early expiration fix it?
? Thundering herd after a cache flush — how do you warm it safely?
? Why can TTL-based eviction cause cache-miss spikes? How do you smooth them?

Sharding strategies

Time O(1) with known shard key · Space Proportional to shard

Problem

A single database cannot hold the data or handle the load. How do you split it across many?

Clarify first

What is the query pattern — single-entity lookups, range scans, or analytics?
What is the natural partition key (user_id, tenant_id, region)?
How often do you need cross-shard queries? (Expensive — design to avoid)

Approach

**Range sharding** — partition by key range (ids 0–1M on shard A, 1M–2M on B). Good for range scans; creates hotspots if ranges are unbalanced or monotonic (e.g., timestamp → last shard gets all writes). **Hash sharding** — partition by hash(key) mod N. Balances load evenly; destroys range-scan locality. Rescaling is painful unless you use consistent hashing. **Directory sharding** — a lookup service maps key → shard. Maximum flexibility; the directory becomes a bottleneck and must itself be scaled.

Follow-up questions

? How do you reshard without downtime?
? What is a "hot shard" and what do you do when one user generates 20% of traffic?
? Cross-shard JOIN: when to denormalize vs fan-out query vs move to a warehouse?

Leader election

Time O(n) messages per election (candidate contacts all peers); O(n) per commit for majority replication · Space O(log) persisted per node

Problem

A set of processes must elect exactly one leader. The leader coordinates; followers replicate. What goes wrong, and how do you make it safe?

Clarify first

Is strong consistency required, or is eventual convergence OK?
What happens if the network splits? (Split-brain is the classic failure mode)
How quickly must a new leader take over after a failure?

Approach

**Use a proven consensus protocol** (Raft, Paxos, Zab) or a coordinator (etcd, ZooKeeper, Consul). Do not hand-roll. Key guarantee: at most one leader per term, and any decision a leader makes is replicated to a majority quorum before commit. For a quick interview answer, describe Raft: (1) each process has a term counter; (2) followers promote to candidate on timeout and request votes; (3) a candidate with a majority becomes leader; (4) the leader heartbeats; (5) partitioned leaders step down when they cannot reach a quorum. The "bully algorithm" is the textbook simple version but does not handle partitions safely.

Follow-up questions

? Why does split-brain lead to two writers? How does a quorum prevent it?
? Lease-based leader election — what does the lease protect against?
? Why does "odd number of nodes" matter for quorum (3, 5, 7)?