Consistency & CAP
The theorems people cite wrong, the models people actually pick, and the read/write anomalies that force the decision.
The trap to avoid
Consistency models
Ordered strongest → weakest. Most systems are not "strong" or "eventual" — they are somewhere in the middle, and naming the model precisely is the senior signal.
| Model | Guarantee | When it fits | Cost |
|---|---|---|---|
| Linearizable (strong) | Every read returns the most recent committed write. System looks like a single copy. | Money, leader election, distributed locks, unique-constraint enforcement. | Quorum round-trip per op; split-brain-safe only via consensus (Raft, Paxos). |
| Sequential | All nodes see operations in the same total order; order need not match real time. | Replicated state machines where per-client ordering matters but wall-clock doesn't. | Cheaper than linearizable but rare in off-the-shelf products. |
| Causal | If A happened before B (causally), all observers see A before B. Concurrent ops may be seen in any order. | Collaborative docs, comment threads, anywhere "reply appears before parent" is a bug. | Version vectors per key; metadata overhead grows with client count. |
| Read-your-writes | A client always sees its own prior writes. | User-facing UIs where posting a comment and refreshing must show it. | Sticky reads to primary or session tokens routed to up-to-date replica. |
| Monotonic reads | Successive reads never go backwards in time for one client. | Infinite scroll, dashboards polling a counter — prevents number going down then up. | Client pins to one replica or carries read-timestamp cookie. |
| Eventual | If writes stop, all replicas eventually converge. Nothing said about order or window. | Caches, CDNs, low-stakes counters, anywhere convergence-eventually is enough. | Cheapest; app must tolerate stale reads and conflicting writes. |
CAP theorem
During a network partition (P), a distributed system must choose: Consistency (every read sees the latest write) or Availability (every request gets a response). You cannot have both while partitioned.
The common misreading
CP systems
ZooKeeper, etcd, HBase, MongoDB (majority writes)
AP systems
DynamoDB (default), Cassandra, Riak, CouchDB
PACELC
PACELC extends CAP: if Partitioned, pick Availability or Consistency; Else (no partition), pick Latency or Consistency.
Explains the trade-off you make even in the sunny-day case. A system that chooses Availability during partitions usually also chooses Latency in normal operation — those two preferences travel together.
PA/EL
Cassandra, DynamoDB — available + low latency; staleness is the price always.
PC/EC
Spanner, CockroachDB — consistent always; pay latency tax always.
PA/EC
MongoDB (tunable w/readConcern) — can be configured either way.
Read/write anomalies
| Anomaly | Scenario | Fixed by |
|---|---|---|
| Dirty read | T1 writes X=5; before commit, T2 reads X=5; T1 rolls back. T2 saw a value that never existed. | Read Committed or higher. |
| Non-repeatable read | T1 reads X=5; T2 writes X=6 and commits; T1 reads X=6 in the same transaction. | Repeatable Read / Snapshot Isolation. |
| Phantom | T1 reads "all rows where status='active'" twice; T2 inserts a new active row in between. | Serializable, or predicate/gap locks (MySQL InnoDB). |
| Lost update | T1 and T2 both read balance=100, add 10, both write 110. Real answer is 120. | SELECT FOR UPDATE, optimistic lock with version column, or Serializable. |
| Write skew | Two doctors on call; each reads "at least one other doctor is on call" and takes themselves off. Now zero on call. | Serializable (snapshot isolation permits this by design). |
| Read skew | T1 reads X=100 then Y=50 from two tables; in between, another txn moved 30 from X to Y. T1 sees 150 total where truth was 150, but intermediate breakdown is nonsense. | Snapshot Isolation (single read snapshot). |
Quorum math
Dynamo-style quorums: W + R > N ⇒ reads see latest committed writes.
| Configuration | Effect |
|---|---|
| W=N, R=1 | Fast reads, slow writes. Reads always fresh as long as no node is down. |
| W=1, R=N | Fast writes, slow reads. Used when writes dominate and staleness intolerable. |
| W=R=(N+1)/2 | Balanced majority quorum. Most common DynamoDB / Cassandra default shape. |
| W + R ≤ N | Possible to read stale. Only acceptable when eventual consistency is explicitly OK. |