Databases
The four decisions interviewers push on: relational vs not, isolation level, replication mode, and sharding strategy. Plus a reminder on indexing.
The senior framing
"Which database" is the wrong first question. The right questions are: what
is the access pattern, what consistency do I need, what is the read/write
mix, and what is the failure mode I cannot tolerate? The database falls out
of the answers.
SQL vs NoSQL
SQL vs NoSQL — pick by access pattern, not by hype
| Criterion | Relational (Postgres, MySQL) | Non-relational (varies) |
|---|---|---|
| Data shape | Normalized tables with known schema; cross-entity joins cheap | Denormalized document / KV / wide-column; joins expensive or absent |
| Consistency | Strong by default; ACID transactions across rows and tables | Eventual by default in most distributed stores; tunable per op in some |
| Scale ceiling | Vertical first; horizontal needs explicit sharding strategy | Horizontal by design (DynamoDB, Cassandra); scale-out is the point |
| Query flexibility | Ad-hoc queries on any indexed column; analytics-friendly | Only the access patterns you designed for; re-design to add new ones |
| Best fit | OLTP, financial, anything with cross-entity integrity | High-volume KV, time-series, session stores, known-shape workloads |
Rule of thumb: Default to Postgres until you have a concrete reason not to. "We might need scale" is not a reason; millions of rows fit on one Postgres node.
Isolation levels
Ordered weakest → strongest. Know what each level allows, not just what it prevents — that is where bugs live.
| Level | Prevents | Still allows | Typical use |
|---|---|---|---|
| Read Uncommitted | Nothing beyond corruption | Dirty reads, non-repeatable reads, phantoms, lost updates | Never in practice. Many engines silently upgrade to Read Committed. |
| Read Committed | Dirty reads | Non-repeatable reads, phantoms, lost updates | Postgres default. Fine for most web apps — each statement sees a fresh snapshot. |
| Repeatable Read | Dirty reads, non-repeatable reads | Phantoms in SQL standard (MySQL InnoDB blocks them anyway via gap locks); still lost updates without SELECT FOR UPDATE | MySQL InnoDB default. Postgres Repeatable Read is actually snapshot isolation. |
| Snapshot Isolation | Dirty reads, non-repeatable reads, phantoms | Write skew (two transactions read overlapping sets, write non-overlapping, both commit) | Postgres Repeatable Read. Good general default; watch for write skew in banking-style invariants. |
| Serializable | All of the above including write skew | Nothing — equivalent to some serial order | Financial / invariant-critical paths. Highest cost; expect aborts and retry loops. |
Replication modes
| Mode | How it works | Consistency | Trade-off |
|---|---|---|---|
| Single-leader async | One primary accepts writes; replicas apply the WAL after commit. | Read-your-writes on primary; stale reads on replicas. | Simple + fast writes. Failover may lose uncommitted tail of the log. |
| Single-leader sync (or semi-sync) | Primary waits for ≥1 replica to ack before acknowledging client. | No data loss on failover of acked writes. | Write latency = primary + slowest synced replica. One slow replica stalls writes. |
| Multi-leader | Writes accepted at any leader; leaders replicate asynchronously to each other. | Conflicting writes require resolution (last-write-wins, CRDTs, app-level merge). | Good for multi-region write locality. Conflict resolution is the hard part. |
| Leaderless (quorum) | Client writes to W replicas, reads from R; overlap ensures freshness when W + R > N. | Tunable per request. Hinted handoff and read repair fix stragglers. | Complex failure modes; strong consistency is possible but expensive. DynamoDB, Cassandra. |
Sharding strategies
| Strategy | How it works | Pros | Cons |
|---|---|---|---|
| Range | Partition by key range: ids 0–1M → shard A, 1M–2M → shard B. | Range scans stay on one shard; easy to reason about. | Monotonic keys (timestamps, autoincrement) create hot-shard on inserts. |
| Hash | Partition by hash(key) mod N (or consistent hashing). | Even load distribution; no natural hotspots. | Destroys range-scan locality; rescaling is painful without consistent hashing. |
| Directory / lookup | A routing service stores a key → shard mapping. | Maximum flexibility; resharding is metadata-only. | Directory is a new bottleneck + single point of failure; needs its own scale plan. |
| Geographic | Partition by region / tenant jurisdiction. | Latency + data-residency (GDPR, HIPAA) solved in one move. | Cross-region queries are the slow path; rebalance is political, not just technical. |
Index types
| Type | Best for | Avoid for | Note |
|---|---|---|---|
| B-tree (default) | Equality + range + ORDER BY on indexed columns. | Full-text search, high-cardinality multi-column filters with arbitrary order. | Postgres / MySQL default. 99% of indexes you create are B-tree. |
| Hash | Pure equality lookup at scale. | Range, ORDER BY, prefix matching. | Rarely worth it — B-tree equality lookups are already fast. |
| GIN / inverted | Full-text, JSONB containment, arrays. | Low-cardinality columns where a bitmap scan is better. | Postgres GIN. Larger + slower to write than B-tree but enables @> and ts queries. |
| Covering (INCLUDE) | Index-only scans — query reads only indexed + included columns. | Wide payloads (defeats the memory-density point of an index). | Index stores extra non-key columns so the row lookup is skipped. |
| Partial | Indexing a hot subset (`WHERE status = 'active'`) to shrink index size. | Predicates that drift over time (requires reindex). | Huge wins when the hot slice is <5% of rows. |
Indexing rules
- Index the columns you filter or join on, not the ones you select.
- Composite index order matters: (user_id, created_at) serves WHERE user_id=? ORDER BY created_at, not the reverse.
- An index on a low-cardinality column (boolean, enum-of-3) rarely earns its write cost — consider a partial index.
- EXPLAIN (ANALYZE, BUFFERS) is the ground truth. Never tune blind.
- N+1 query is the #1 interview-day bug. Detect with query logs; fix with JOIN or batched IN (…).