SQL

Statements

Statement	Description	Example
WHERE	Filters rows based on a specified condition.	`SELECT * FROM table_name WHERE condition_1;`
AND	Combines multiple conditions in a WHERE clause, requiring all to be true.	`SELECT * FROM table_name WHERE condition_1 AND condition_2;`
OR	Combines multiple conditions in a WHERE clause, requiring at least one to be true.	`SELECT * FROM table_name WHERE condition_1 OR condition_2;`
LIMIT	Limits the number of rows returned by a query.	`SELECT * FROM table_name LIMIT 10;`
GROUP BY	Groups rows that have the same values into summary rows.	`SELECT column_name, COUNT(*) FROM table_name GROUP BY column_name;`
ORDER BY	Sorts the result set by specified columns (ASC or DESC).	`SELECT * FROM table_name ORDER BY column_name ASC\|DESC;`
COUNT	Counts the number of rows in a specified column or all rows.	`SELECT COUNT(*) FROM table_name;`
AS	Aliases are used to rename a table or column in a SQL query result.	`SELECT column_name AS alias_name FROM table_name;`
INNER JOIN	Returns records that have matching values in both tables.	`SELECT * FROM table1 INNER JOIN table2 ON table1.column_name = table2.column_name;`
LEFT JOIN	Returns all records from the left table and the matched records from the right table.	`SELECT * FROM table1 LEFT JOIN table2 ON table1.column_name = table2.column_name;`
RIGHT JOIN	Returns all records from the right table and the matched records from the left table.	`SELECT * FROM table1 RIGHT JOIN table2 ON table1.column_name = table2.column_name;`
FULL JOIN	Returns all records when there is a match in either left or right table.	`SELECT * FROM table1 FULL JOIN table2 ON table1.column_name = table2.column_name;`

Indexing

Indexes trade write speed and disk space for read speed. Every index makes INSERT / UPDATE slower and takes storage — so index for the queries you actually run, not hypothetical ones.

Kind	Use for	Gotcha
B-tree (default)	Equality and range queries on ordered columns; most WHERE, ORDER BY, JOIN conditions	Leading-column rule: a composite index on (a, b, c) can serve queries on a, (a, b), or (a, b, c) — but NOT on b or c alone
Covering index	Queries whose SELECT list is fully included in the index — the DB never touches the table	Adding columns bloats the index; only cover when the query is hot
Partial index	"Only index active rows" / "only non-null rows" — shrinks the index and speeds writes	The WHERE predicate on the index must match the query predicate exactly
Hash index	Equality only, at O(1) — some engines (PG) have it, most default to B-tree	Useless for range queries or ORDER BY
GIN / inverted	Full-text search, JSONB, array containment	Write amplification — slower INSERTs; worth it only for search-heavy workloads
Unique index	Enforcing uniqueness + an index for lookups in one structure	On a nullable column, multiple NULLs are allowed in most engines (they compare unequal)

Senior insight

"This table has 20 indexes and writes are slow" is an interview anti-pattern — indexes compound. Prefer fewer, composite indexes chosen for the hot query shapes over many single-column indexes.

Transaction isolation

Isolation levels trade consistency guarantees for concurrency. Higher levels prevent more anomalies but serialize more work.

Level	Dirty read	Non-repeatable read	Phantom read	Note
Read Uncommitted	Possible	Possible	Possible	Rarely a good choice — you see uncommitted writes from other transactions
Read Committed	No	Possible	Possible	Default in most engines (PostgreSQL, Oracle). Fine for most OLTP workloads
Repeatable Read	No	No	Possible (in standard); PG implements snapshot isolation here, blocking phantoms too	Default in MySQL/InnoDB. Chosen when re-reading rows in a transaction must yield the same values
Serializable	No	No	No	Strongest. Cost: more aborts under contention (you must retry). Use for financial invariants

The anomalies in one sentence each

Dirty read: you see a value another transaction wrote and then rolled back.
Non-repeatable read: you re-read a row and get a different value because someone committed between your reads.
Phantom read: you re-run a range query and new rows appear that a concurrent transaction committed.
Lost update: two transactions read, modify, and write the same row; one overwrite is silently lost. Fix with SELECT … FOR UPDATE or optimistic version columns.

The N+1 query problem

A parent query returns N rows, then code loops issuing one child query per row — 1 + N round-trips to the DB. Ruins latency on anything more than a handful of rows.

Signature: You see the same child query in your DB logs repeated with different parameters, once per parent row.

Fixes

Join the child query into the parent (INNER / LEFT JOIN) when the result shape allows
Use `WHERE id IN (…)` with the list of parent IDs, then group in code
In ORMs, use eager-loading hints (`include`, `joinedload`, `preload`) — but verify with query logs; some ORMs still issue N queries under the hood

Why interviewers love this

Every backend engineer has shipped an N+1 at least once. Being able to describe the signature in DB logs (not just the abstract definition) shows you have actually debugged one, not just read about it.

Reading EXPLAIN

Signals to look for in a query plan, and the typical fix.

Signal	Meaning	Fix
Seq Scan on a large table	The planner is reading every row — either no usable index or the optimizer chose a scan because the predicate hits most rows	Add an index on the predicate column; check statistics with `ANALYZE`
Rows estimate vs actual off by 10×+	Stale statistics — the planner is making bad choices from outdated row counts	Run `ANALYZE` (PostgreSQL) or `ANALYZE TABLE` (MySQL); consider autovacuum tuning
Nested Loop with a large outer side	For each outer row, probing the inner — fine if outer is small, catastrophic otherwise	Hint or rewrite to encourage Hash Join or Merge Join; ensure join column is indexed on the inner side
Sort in memory or external merge	ORDER BY or DISTINCT forcing a sort that exceeds `work_mem` / equivalent	Add an index that already provides the order; raise memory limits per query only if needed
Filter removing most rows after the scan	The index (or lack of it) pulled in rows the WHERE clause then discards	Add a composite or partial index so the filter is applied by the index, not after

Pagination strategies

Strategy	Example	Pros	Cons
Offset / Limit	`SELECT * FROM items ORDER BY id LIMIT 20 OFFSET 1000`	Trivial to implement; supports jumping to page N	O(offset) — page 1000 reads + discards 20 000 rows. Also non-stable under concurrent writes — items shift between pages
Cursor / Keyset	`SELECT * FROM items WHERE id > :last_id ORDER BY id LIMIT 20`	O(limit) regardless of depth; stable under concurrent writes	No "jump to page N"; requires a strictly-ordered indexed column

Rule of thumb

Use cursor pagination for infinite scroll, API consumers, and anything that needs stable ordering. Offset is acceptable for admin consoles with small result sets and a "jump to page N" requirement.