Consistency & CAP

The theorems people cite wrong, the models people actually pick, and the read/write anomalies that force the decision.

The trap to avoid

Never say "we chose CP" or "we chose AP" without naming the partition scenario you are choosing against. The decision is only meaningful during a partition — and PACELC says there is a second decision waiting even when the network is healthy.

Consistency models

Ordered strongest → weakest. Most systems are not "strong" or "eventual" — they are somewhere in the middle, and naming the model precisely is the senior signal.

ModelGuaranteeWhen it fitsCost
Linearizable (strong) Every read returns the most recent committed write. System looks like a single copy. Money, leader election, distributed locks, unique-constraint enforcement. Quorum round-trip per op; split-brain-safe only via consensus (Raft, Paxos).
Sequential All nodes see operations in the same total order; order need not match real time. Replicated state machines where per-client ordering matters but wall-clock doesn't. Cheaper than linearizable but rare in off-the-shelf products.
Causal If A happened before B (causally), all observers see A before B. Concurrent ops may be seen in any order. Collaborative docs, comment threads, anywhere "reply appears before parent" is a bug. Version vectors per key; metadata overhead grows with client count.
Read-your-writes A client always sees its own prior writes. User-facing UIs where posting a comment and refreshing must show it. Sticky reads to primary or session tokens routed to up-to-date replica.
Monotonic reads Successive reads never go backwards in time for one client. Infinite scroll, dashboards polling a counter — prevents number going down then up. Client pins to one replica or carries read-timestamp cookie.
Eventual If writes stop, all replicas eventually converge. Nothing said about order or window. Caches, CDNs, low-stakes counters, anywhere convergence-eventually is enough. Cheapest; app must tolerate stale reads and conflicting writes.

CAP theorem

During a network partition (P), a distributed system must choose: Consistency (every read sees the latest write) or Availability (every request gets a response). You cannot have both while partitioned.

The common misreading

CAP is often stated as "pick 2 of 3" — that is wrong. P is not optional in real networks. The real trade-off is C vs A, and only during partitions.

CP systems

ZooKeeper, etcd, HBase, MongoDB (majority writes)

AP systems

DynamoDB (default), Cassandra, Riak, CouchDB

PACELC

PACELC extends CAP: if Partitioned, pick Availability or Consistency; Else (no partition), pick Latency or Consistency.

Explains the trade-off you make even in the sunny-day case. A system that chooses Availability during partitions usually also chooses Latency in normal operation — those two preferences travel together.

PA/EL

Cassandra, DynamoDB — available + low latency; staleness is the price always.

PC/EC

Spanner, CockroachDB — consistent always; pay latency tax always.

PA/EC

MongoDB (tunable w/readConcern) — can be configured either way.

Read/write anomalies

AnomalyScenarioFixed by
Dirty read T1 writes X=5; before commit, T2 reads X=5; T1 rolls back. T2 saw a value that never existed. Read Committed or higher.
Non-repeatable read T1 reads X=5; T2 writes X=6 and commits; T1 reads X=6 in the same transaction. Repeatable Read / Snapshot Isolation.
Phantom T1 reads "all rows where status='active'" twice; T2 inserts a new active row in between. Serializable, or predicate/gap locks (MySQL InnoDB).
Lost update T1 and T2 both read balance=100, add 10, both write 110. Real answer is 120. SELECT FOR UPDATE, optimistic lock with version column, or Serializable.
Write skew Two doctors on call; each reads "at least one other doctor is on call" and takes themselves off. Now zero on call. Serializable (snapshot isolation permits this by design).
Read skew T1 reads X=100 then Y=50 from two tables; in between, another txn moved 30 from X to Y. T1 sees 150 total where truth was 150, but intermediate breakdown is nonsense. Snapshot Isolation (single read snapshot).

Quorum math

Dynamo-style quorums: W + R > N ⇒ reads see latest committed writes.

Configuration Effect
W=N, R=1 Fast reads, slow writes. Reads always fresh as long as no node is down.
W=1, R=N Fast writes, slow reads. Used when writes dominate and staleness intolerable.
W=R=(N+1)/2 Balanced majority quorum. Most common DynamoDB / Cassandra default shape.
W + R ≤ N Possible to read stale. Only acceptable when eventual consistency is explicitly OK.