Databases

The four decisions interviewers push on: relational vs not, isolation level, replication mode, and sharding strategy. Plus a reminder on indexing.

The senior framing

"Which database" is the wrong first question. The right questions are: what is the access pattern, what consistency do I need, what is the read/write mix, and what is the failure mode I cannot tolerate? The database falls out of the answers.

SQL vs NoSQL

SQL vs NoSQL — pick by access pattern, not by hype

Criterion	Relational (Postgres, MySQL)	Non-relational (varies)
Data shape	Normalized tables with known schema; cross-entity joins cheap	Denormalized document / KV / wide-column; joins expensive or absent
Consistency	Strong by default; ACID transactions across rows and tables	Eventual by default in most distributed stores; tunable per op in some
Scale ceiling	Vertical first; horizontal needs explicit sharding strategy	Horizontal by design (DynamoDB, Cassandra); scale-out is the point
Query flexibility	Ad-hoc queries on any indexed column; analytics-friendly	Only the access patterns you designed for; re-design to add new ones
Best fit	OLTP, financial, anything with cross-entity integrity	High-volume KV, time-series, session stores, known-shape workloads

Rule of thumb: Default to Postgres until you have a concrete reason not to. "We might need scale" is not a reason; millions of rows fit on one Postgres node.

Isolation levels

Ordered weakest → strongest. Know what each level allows, not just what it prevents — that is where bugs live.

Level	Prevents	Still allows	Typical use
Read Uncommitted	Nothing beyond corruption	Dirty reads, non-repeatable reads, phantoms, lost updates	Never in practice. Many engines silently upgrade to Read Committed.
Read Committed	Dirty reads	Non-repeatable reads, phantoms, lost updates	Postgres default. Fine for most web apps — each statement sees a fresh snapshot.
Repeatable Read	Dirty reads, non-repeatable reads	Phantoms in SQL standard (MySQL InnoDB blocks them anyway via gap locks); still lost updates without SELECT FOR UPDATE	MySQL InnoDB default. Postgres Repeatable Read is actually snapshot isolation.
Snapshot Isolation	Dirty reads, non-repeatable reads, phantoms	Write skew (two transactions read overlapping sets, write non-overlapping, both commit)	Postgres Repeatable Read. Good general default; watch for write skew in banking-style invariants.
Serializable	All of the above including write skew	Nothing — equivalent to some serial order	Financial / invariant-critical paths. Highest cost; expect aborts and retry loops.

Replication modes

Mode	How it works	Consistency	Trade-off
Single-leader async	One primary accepts writes; replicas apply the WAL after commit.	Read-your-writes on primary; stale reads on replicas.	Simple + fast writes. Failover may lose uncommitted tail of the log.
Single-leader sync (or semi-sync)	Primary waits for ≥1 replica to ack before acknowledging client.	No data loss on failover of acked writes.	Write latency = primary + slowest synced replica. One slow replica stalls writes.
Multi-leader	Writes accepted at any leader; leaders replicate asynchronously to each other.	Conflicting writes require resolution (last-write-wins, CRDTs, app-level merge).	Good for multi-region write locality. Conflict resolution is the hard part.
Leaderless (quorum)	Client writes to W replicas, reads from R; overlap ensures freshness when W + R > N.	Tunable per request. Hinted handoff and read repair fix stragglers.	Complex failure modes; strong consistency is possible but expensive. DynamoDB, Cassandra.

Sharding strategies

Strategy	How it works	Pros	Cons
Range	Partition by key range: ids 0–1M → shard A, 1M–2M → shard B.	Range scans stay on one shard; easy to reason about.	Monotonic keys (timestamps, autoincrement) create hot-shard on inserts.
Hash	Partition by hash(key) mod N (or consistent hashing).	Even load distribution; no natural hotspots.	Destroys range-scan locality; rescaling is painful without consistent hashing.
Directory / lookup	A routing service stores a key → shard mapping.	Maximum flexibility; resharding is metadata-only.	Directory is a new bottleneck + single point of failure; needs its own scale plan.
Geographic	Partition by region / tenant jurisdiction.	Latency + data-residency (GDPR, HIPAA) solved in one move.	Cross-region queries are the slow path; rebalance is political, not just technical.

Index types

Type	Best for	Avoid for	Note
B-tree (default)	Equality + range + ORDER BY on indexed columns.	Full-text search, high-cardinality multi-column filters with arbitrary order.	Postgres / MySQL default. 99% of indexes you create are B-tree.
Hash	Pure equality lookup at scale.	Range, ORDER BY, prefix matching.	Rarely worth it — B-tree equality lookups are already fast.
GIN / inverted	Full-text, JSONB containment, arrays.	Low-cardinality columns where a bitmap scan is better.	Postgres GIN. Larger + slower to write than B-tree but enables @> and ts queries.
Covering (INCLUDE)	Index-only scans — query reads only indexed + included columns.	Wide payloads (defeats the memory-density point of an index).	Index stores extra non-key columns so the row lookup is skipped.
Partial	Indexing a hot subset (`WHERE status = 'active'`) to shrink index size.	Predicates that drift over time (requires reindex).	Huge wins when the hot slice is <5% of rows.

Indexing rules

Index the columns you filter or join on, not the ones you select.
Composite index order matters: (user_id, created_at) serves WHERE user_id=? ORDER BY created_at, not the reverse.
An index on a low-cardinality column (boolean, enum-of-3) rarely earns its write cost — consider a partial index.
EXPLAIN (ANALYZE, BUFFERS) is the ground truth. Never tune blind.
N+1 query is the #1 interview-day bug. Detect with query logs; fix with JOIN or batched IN (…).